About the job
About Our Team
The Codex Core Agent team is responsible for developing the foundational kernel of Codex. We focus on enhancing the agent's capabilities, expediting research, and ensuring these advancements translate into practical applications for our users.
This entails collaboration across the various systems that allow Codex to function effectively in real-world scenarios, including optimizing production performance, managing token usage, minimizing latency, ensuring reliability, controlling costs, and scaling capacity. We also work on the core execution loop and interfaces that transform models into practical actions, alongside building infrastructure that supports other teams in leveraging Codex. Importantly, our feedback systems utilize real-world usage to refine models and enhance agent performance over time.
About the Role
We are seeking talented applied AI engineers to help transition Codex agents from impressive demonstrations to reliable, everyday tools. This position involves enhancing agent performance on actual software engineering tasks and bridging the gap between research capabilities and tangible real-world utility.
You will collaborate closely with research teams, infrastructure, and product development to ensure that agents are not only powerful but also practical, controllable, and dependable. Your mission is to not only enhance model performance in isolation but also to translate those improvements into measurable increases in solution rates, usability, and economic value for our users.
Your Responsibilities
Design and refine agent behaviors across real-world coding tasks and extended workflows.
Collaborate with research to create and execute evaluations that measure agent performance, identify regressions, and understand failure modes and edge cases.
Enhance performance through effective prompting, strategic tool usage, context construction, and model-facing experimentation.
Investigate production failures and systematically enhance stability and reliability.
Develop feedback loops and data systems that integrate real-task data into our evaluation and research processes.
Collaborate with product teams to shape user-facing agent experiences and the essential interfaces that support agent functions.
Help define the criteria for what constitutes successful completion of complex tasks by agents.
Ideal Candidate Profile
Experience in building or deploying machine learning or LLM-powered products.
Proficiency in Python and familiarity with modern machine learning tools.
Background in model evaluation, fine-tuning, or related areas.

