companyAnthropic logo

Software Engineer for Safeguards Foundations - Internal Tooling

AnthropicLondon, UK
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

ResponsibilitiesDesign, build, and maintain the internal review and enforcement tooling for Safeguards analysts, including case queues, content review interfaces, decision/audit logging, and account-actioning workflows. Comprehend user workflows and establish tooling to optimize processes that may be distributed across various tools and UIs. Develop the foundational layer of reusable APIs, data storage solutions, and backend services that enable rapid and secure implementation of new review workflows. Collaborate with operations and policy teams to identify reviewer challenges and translate them into actionable product improvements that minimize handling time and decision errors. Seamlessly integrate tooling with upstream detection systems and downstream enforcement infrastructures to ensure that flagged behaviors transition smoothly from signal to human review to action. Incorporate the necessary guardrails for sensitive internal tools, including granular permissions, audit trails, data-access controls, and features supporting reviewer wellbeing.

About the job

About Anthropic

Anthropic builds AI systems with a focus on reliability, interpretability, and steerability. The company’s mission centers on making AI safe and beneficial for both individuals and society. The team includes researchers, engineers, policy experts, and business leaders working together to advance responsible AI development.

Role Overview: Software Engineer, Safeguards Foundations – Internal Tooling

The Safeguards team at Anthropic is responsible for detecting, reviewing, and addressing potential misuse of the company’s AI models. Within this team, the Foundations group develops the infrastructure, platforms, and internal tools that support these safeguards across the organization.

This role focuses on improving internal tooling for human review. The work covers case management, labeling workflows, investigative processes, and enforcement interfaces used daily by analysts and policy specialists. Although these tools operate behind the scenes, their reliability and clarity directly affect how quickly Anthropic can spot harmful behaviors, make enforcement decisions, and provide feedback for model training.

The position involves close collaboration with Trust & Safety operations, policy, and detection-engineering teams. The goal: turn complex operational needs into robust, maintainable software that supports Anthropic’s safety mission.

What You Will Do

  • Enhance and maintain internal tools for human review, including case management and enforcement interfaces
  • Work across the stack to deliver reliable, user-friendly products for internal stakeholders
  • Partner with operations, policy, and engineering teams to understand workflows and translate them into effective software solutions
  • Support the organization’s ability to detect and respond to AI misuse efficiently

Location

London, UK

About Anthropic

Anthropic is at the forefront of AI safety, dedicated to developing systems that prioritize reliability and interpretability. Our team consists of a diverse group of professionals working collaboratively to ensure that the AI we build is not only beneficial but also safe for all users. Join us in our mission to create a better AI landscape.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.