About the job
Our Mission
At Reflection AI, our mission is to develop open superintelligence and make it universally accessible.
We are pioneering open weight models tailored for individuals, agents, enterprises, and even nation-states. Our exceptional team comprises AI researchers and innovators from leading organizations such as DeepMind, OpenAI, Google Brain, Meta, Character. AI, Anthropic, and more.
About the Role
Lead the red-teaming and adversarial evaluation processes for Reflection’s models, systematically identifying failure modes related to security, misuse, and alignment discrepancies.
Collaborate closely with the Alignment team to convert safety insights into actionable guardrails, ensuring models perform reliably under pressure and comply with deployment protocols.
Ensure that every model release aligns with our lab's risk criteria prior to deployment, acting as a crucial checkpoint for our open weight outputs.
Create scalable, automated safety benchmarks that adapt alongside our model advancements, transitioning from static datasets to dynamic adversarial assessments.
Investigate and apply cutting-edge jailbreaking strategies and defenses to proactively address potential vulnerabilities.
About You
Possess a graduate degree (MS or PhD) in Computer Science, Machine Learning, or a related field, or have equivalent hands-on experience in AI Safety.
Demonstrate profound technical expertise in LLM safety, encompassing adversarial attacks, red-teaming practices, and model interpretability.
Exhibit robust software engineering skills with a background in developing automated evaluation frameworks or extensive ML systems.
Experience with Reinforcement Learning (RLHF/RLAIF) and its implications for model safety and alignment is highly advantageous.
Flourish in a dynamic, high-agency startup environment with a proactive approach to challenges.
Be prepared to make critical decisions regarding model releases and safety protocols.
Show a strong commitment to advancing the frontiers of intelligence.
What We Offer:
At Reflection, we believe that to create genuinely open superintelligence, we must start from the ground up. Joining our team means being part of a small, highly skilled group where you will play a vital role in shaping our future and defining the landscape of open foundational models.
