About the job
Our Vision
At Reflection AI, we are dedicated to developing open superintelligence and ensuring it is accessible to everyone.
We are pioneering open weight models for individuals, agents, organizations, and even nations. Our diverse team of AI researchers and industry pioneers hail from esteemed organizations such as DeepMind, OpenAI, Google Brain, Meta, Character. AI, Anthropic, and many others.
Position Overview
In the rapidly evolving field of AI, data has become indispensable for innovation. Notable progress in recent years has stemmed more from enhanced data quality than from novel architectures.
As a valued member of our Data Team, you will be responsible for ensuring that the datasets used to train and assess our models are of exceptional quality, reliability, and effectiveness. Your contributions will directly influence our models' capabilities in areas such as agentic tool usage, long-term reasoning, and ensuring robust safety alignment.
Collaborating with elite researchers within our post-training teams, you will help transform abstract concepts of “good data” into tangible, scalable standards for extensive data initiatives. We seek engineers who possess a strong foundation in engineering principles paired with a genuine curiosity about data quality and its effects on model performance.
Working closely with our post-training teams, your responsibilities will include:
Managing upstream data quality for LLM post-training and evaluations by analyzing expert-generated datasets and implementing quality benchmarks for reasoning, alignment, and agentic applications.
Collaborating with research and post-training teams to translate requirements into quantifiable quality indicators and offering actionable insights to external data providers.
Developing, validating, and expanding automated quality assurance methods, including LLM-as-a-Judge frameworks, to consistently assess data quality across large-scale projects.
Creating reusable quality assurance pipelines that consistently supply high-quality data to post-training teams for model training and evaluation.
Tracking and reporting on data quality trends over time, fostering continuous improvements in quality standards, processes, and acceptance criteria.
Qualifications
Solid engineering principles with experience in building data pipelines, quality assurance systems, or evaluation workflows tailored for post-training data and agentic environments.
Meticulous and analytical, capable of pinpointing failure modes, inconsistencies, and nuanced issues that influence data quality.
A robust understanding of the impact of data quality on model behavior and performance.

