About the job
About Anthropic
Anthropic builds AI systems with a focus on reliability, interpretability, and steerability. The company’s mission centers on making AI safe and beneficial for both individuals and society. The team includes researchers, engineers, policy experts, and business leaders working together to advance responsible AI development.
Role Overview: Software Engineer, Safeguards Foundations – Internal Tooling
The Safeguards team at Anthropic is responsible for detecting, reviewing, and addressing potential misuse of the company’s AI models. Within this team, the Foundations group develops the infrastructure, platforms, and internal tools that support these safeguards across the organization.
This role focuses on improving internal tooling for human review. The work covers case management, labeling workflows, investigative processes, and enforcement interfaces used daily by analysts and policy specialists. Although these tools operate behind the scenes, their reliability and clarity directly affect how quickly Anthropic can spot harmful behaviors, make enforcement decisions, and provide feedback for model training.
The position involves close collaboration with Trust & Safety operations, policy, and detection-engineering teams. The goal: turn complex operational needs into robust, maintainable software that supports Anthropic’s safety mission.
What You Will Do
- Enhance and maintain internal tools for human review, including case management and enforcement interfaces
- Work across the stack to deliver reliable, user-friendly products for internal stakeholders
- Partner with operations, policy, and engineering teams to understand workflows and translate them into effective software solutions
- Support the organization’s ability to detect and respond to AI misuse efficiently
Location
London, UK

