company

Reinforcement Learning Software Engineer

Preference ModelSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

QualificationsTo succeed in this role, candidates should meet the following qualifications:Strong analytical and technical skills. Ability to work collaboratively in a fast-paced environment. Excellent communication skills.

About the job

About Us

At Preference Model, we are at the forefront of developing advanced training data essential for the evolution of artificial intelligence. While today's AI models exhibit significant power, they often fall short in diverse applications due to limitations in their training data. We specialize in creating reinforcement learning environments that present AI with authentic research and engineering challenges, enabling them to iterate and learn through realistic feedback loops.

Our founding team boasts experience from Anthropic’s data department, where we established the data infrastructure, tokenizers, and datasets that supported Claude. We collaborate with top-tier AI research labs to bring AI closer to its groundbreaking potential and are proudly backed by a16z.

About the Role

As a Software Engineer on our team, your responsibilities will include:

  • Designing and Developing Reinforcement Learning Environments: Architect comprehensive simulation platforms that encompass environmental context, task definitions, and reward functions to facilitate AI agents' learning and performance of intricate tasks.

  • Building Robust Training Infrastructure: Create scalable systems for post-training AI models, focusing on orchestration, performance optimization, and monitoring capabilities.

  • Implementing Realistic Model Evaluations: Develop metrics for evaluating AI agent performance and establish the infrastructure and tools necessary for conducting these evaluations.

  • Influencing Technical Strategy: Take charge of architectural decisions, impact product roadmaps, and contribute significantly to our engineering culture as an early-stage team member.

About You

You might be a great fit for this role if you possess the following qualities:

  • Adept at leveraging language models effectively.

  • Ability to innovate and think outside the box.

  • A minimum of 4 years of software engineering experience, showcasing your ability to take ownership of projects.

  • Proficiency in Python, Rust, or TypeScript, with the capability to work across the entire software stack.

  • Hands-on experience with modern deployment practices, containerization, and cloud infrastructure (such as Kubernetes, AWS, or GCP).

  • Strong problem-solving skills demonstrated through algorithmic challenges or complex system design tasks.

Nice-to-Haves

Preferred candidates will have experience in:

  • Machine learning infrastructure or reinforcement learning.

About Preference Model

Preference Model is a pioneering company dedicated to creating the next generation of training data for AI. Our mission is to harness the potential of artificial intelligence through innovative reinforcement learning environments that address real-world challenges.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.