About the job
About Our Team
At OpenAI, our Safety Systems team is dedicated to advancing the mission of developing and deploying safe artificial general intelligence (AGI). We are establishing a specialized research team focused on identifying and addressing critical misalignment issues that may arise as AGI technology evolves. Our goal is to proactively quantify and mitigate potential misalignment risks to ensure they do not threaten societal wellbeing.
Our research efforts are structured around four key areas:
Worst-Case Demonstrations – Create compelling demonstrations that illustrate how AI systems can fail, particularly in scenarios where misaligned AGI could undermine human interests.
Adversarial & Frontier Safety Evaluations – Develop rigorous evaluations based on these demonstrations to measure dangerous capabilities and remaining risks, focusing on issues like deceptive behavior and power-seeking tendencies.
System-Level Stress Testing – Construct automated infrastructure to stress-test entire product stacks, evaluating their robustness under extreme conditions and evolving the tests as systems improve.
Alignment Stress-Testing Research – Analyze failures in mitigations and publish insights to inform strategy and develop next-generation safeguards, collaborating with other research teams for collective advancement.
About the Role
We are looking for a passionate Senior Researcher focused on AI safety and red-teaming. In this role, you will design and execute innovative attacks, contribute to adversarial evaluations, and deepen our understanding of how safety measures can fail, and how they can be improved. Your findings will significantly impact OpenAI's product releases and long-term safety strategies.
Key Responsibilities
Create and implement worst-case demonstrations that clarify AGI alignment risks for stakeholders, particularly in critical use cases.
Develop comprehensive adversarial and system-level evaluations based on these demonstrations, promoting their integration across OpenAI.
Design automated tools and frameworks to enhance our red-teaming and stress-testing capabilities.
