About the job
Team focus
The Alignment Science team at OpenAI works on intent alignment for artificial intelligence. Their goal is to develop models that accurately interpret and follow user requests, while maintaining high standards for safety and transparency. As AI models become more advanced, the team prioritizes keeping them honest about their capabilities and limitations, ensuring close alignment with user intent.
Research spans both theoretical and applied domains. The team shares findings publicly and integrates new alignment techniques into OpenAI's deployed models. Recent efforts have targeted model honesty, studying how models admit mistakes, avoid generating false information, and resist manipulation. The team is looking for scalable solutions to improve instruction following and reliability in AI systems.
Quantitative research is a core part of this work, especially reinforcement learning and related training and evaluation methods that support safer, more reliable AI interactions.
Role overview
This Researcher in Alignment Science position (which may be titled Research Engineer or Research Scientist) centers on designing and running experiments to improve how models follow user intent. Responsibilities include developing training protocols, building evaluation frameworks, and strengthening research infrastructure to support effective alignment in new models.
The job is based in San Francisco, CA, with a hybrid schedule requiring three days per week in the office. OpenAI provides relocation support for new hires. Exceptional remote candidates who can work independently and collaborate closely with the team will also be considered.
Main responsibilities
- Design and conduct experiments on alignment techniques, including intent following, honesty, calibration, and robustness.
- Train and assess models using reinforcement learning and other empirical machine learning approaches.
- Develop evaluation metrics for failure modes such as hallucination, compliance gaps, reward exploitation, and covert actions.
- Investigate methods to encourage models to self-verify and report limitations honestly, including confession-style training objectives.
- Create monitoring tools and interventions at inference time to help models act as intended.

