About the job
Our Mission
At Reflection AI, we are committed to creating open superintelligence that is accessible to everyone. Our team is dedicated to developing open weight models tailored for individuals, agents, enterprises, and nation states. Our diverse group of AI experts comes from prestigious organizations such as DeepMind, OpenAI, Google Brain, Meta, Character. AI, and Anthropic.
About the Role
As a Research Program Manager (RPM) at Reflection AI, you will play a pivotal role in leading and collaborating with our research and infrastructure teams to expedite the advancement of cutting-edge model development. You will not merely track projects; you will be a catalyst for clarity in uncertain situations, facilitate decision-making processes, and ensure cohesive integration across multiple teams.
This is a crucial position where you will spearhead the establishment of model evaluations and safety protocols from the ground up. You will define evaluation frameworks, construct the operational infrastructure for model safety, and create processes that seamlessly connect evaluations within the model development lifecycle. You will be laying the foundation for how Reflection AI interacts with the broader safety ecosystem. This is quintessential 0-to-1 work.
Possessing a proactive, first-responder mindset, you will take initiative to address challenges head-on, assess situations, and drive resolutions collaboratively.
What You'll Do
Develop the essential infrastructure for model evaluations and safety. Formulate evaluation frameworks, outline tooling requirements, and establish operational processes that will guide our assessment of model capabilities, risks, and readiness for deployment.
Establish model safety operations as a core function, including setting workflows, review schedules, and decision-making frameworks that link safety evaluations to the model development and release processes.
Collaborate with research and engineering leads throughout the pre-training, mid-training, and post-training phases to integrate safety and evaluation checkpoints into the development workflow in a manner that is thorough yet efficient.
Lead the scoping and prioritization of evaluation science and infrastructure investments, partnering with technical leads to determine which aspects to develop internally and which to adopt from external sources.

