About the job
AI Trainer: Code Generation Specialist
Overview
Join our dynamic team at embedding-vc, where we are dedicated to advancing the capabilities of large language models in understanding and generating real-world code. Our focus lies in evaluating and enhancing multi-step reasoning pathways derived from actual GitHub repositories, with the aim of delivering high-quality and reliable code generation outputs.
This long-term initiative requires engineers who possess strong engineering judgment and a deep understanding of complex coding environments. You will engage with intricate code paths and reasoning flows across various platforms.
Your Responsibilities
As an AI Trainer, you will assess and enhance multi-step reasoning trajectories generated from live production repositories. Your key tasks will include:
- Reviewing and analyzing model-generated reasoning sequences
- Identifying logical inconsistencies or weak reasoning steps
- Improving trajectory structures to yield robust, production-grade outputs
- Evaluating reasoning quality across diverse programming environments
This role is more aligned with debugging model logic and reasoning systems than traditional annotation tasks.
Qualifications We Seek
We are in search of engineers with extensive hands-on development experience and a robust familiarity with real codebases. Ideal candidates will:
- Be proficient in at least two mainstream programming languages, such as Python, C++, Java, TypeScript, or JavaScript
- Have practical development experience in areas including backend systems, frontend applications, algorithms, testing, and infrastructure
- Be adept at navigating and reasoning through large GitHub repositories
- Possess strong written communication skills
Experience contributing to highly visible or popular GitHub repositories is a significant advantage.
Additional Information
We plan to onboard approximately 10 to 20 engineers for this ongoing initiative. A brief qualification exercise may be required prior to joining.

