About the job
About Us
At Preference Model, we are at the forefront of developing advanced training data essential for the evolution of artificial intelligence. While today's AI models exhibit significant power, they often fall short in diverse applications due to limitations in their training data. We specialize in creating reinforcement learning environments that present AI with authentic research and engineering challenges, enabling them to iterate and learn through realistic feedback loops.
Our founding team boasts experience from Anthropic’s data department, where we established the data infrastructure, tokenizers, and datasets that supported Claude. We collaborate with top-tier AI research labs to bring AI closer to its groundbreaking potential and are proudly backed by a16z.
About the Role
As a Software Engineer on our team, your responsibilities will include:
Designing and Developing Reinforcement Learning Environments: Architect comprehensive simulation platforms that encompass environmental context, task definitions, and reward functions to facilitate AI agents' learning and performance of intricate tasks.
Building Robust Training Infrastructure: Create scalable systems for post-training AI models, focusing on orchestration, performance optimization, and monitoring capabilities.
Implementing Realistic Model Evaluations: Develop metrics for evaluating AI agent performance and establish the infrastructure and tools necessary for conducting these evaluations.
Influencing Technical Strategy: Take charge of architectural decisions, impact product roadmaps, and contribute significantly to our engineering culture as an early-stage team member.
About You
You might be a great fit for this role if you possess the following qualities:
Adept at leveraging language models effectively.
Ability to innovate and think outside the box.
A minimum of 4 years of software engineering experience, showcasing your ability to take ownership of projects.
Proficiency in Python, Rust, or TypeScript, with the capability to work across the entire software stack.
Hands-on experience with modern deployment practices, containerization, and cloud infrastructure (such as Kubernetes, AWS, or GCP).
Strong problem-solving skills demonstrated through algorithmic challenges or complex system design tasks.
Nice-to-Haves
Preferred candidates will have experience in:
Machine learning infrastructure or reinforcement learning.

