Technical Staff Member - Machine Learning

RekaUS, UK, Remote

Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

You might be an ideal candidate if you possess:Hands-on experience in training and evaluating large-scale deep learning models. Expertise in popular deep learning frameworks such as PyTorch and JAX. A strong background in deploying machine learning algorithms within software systems at scale. The adaptability to thrive in a dynamic environment with a degree of uncertainty. A collaborative and supportive approach to teamwork.

About the job

As a Technical Staff Member specializing in Machine Learning, you will:

Engage in the complete development lifecycle of innovative large-scale deep learning models.
Curate datasets, architect solutions, implement algorithms, and train and assess models to enhance our offerings.
Work collaboratively with engineers and researchers to convert groundbreaking research into real-world applications.
Join us at a pivotal time, take on diverse roles, and contribute to building transformative products from the ground up!

About Reka

Reka is on a mission to create valuable multimodal artificial intelligence that empowers organizations and businesses. As a startup focused on foundation models, we are headquartered in the San Francisco Bay Area, California, with a commitment to a remote-first culture. Our diverse team comprises top talent from around the globe, including contributors to significant AI advancements over the past decade.

Similar jobs

1 - 20 of 2,238 Jobs

Search for Technical Staff Member Post Training In Reinforcement Learning

2,238 results

Select all on this page (20)

Apply

Technical Staff Member - Post-Training in Reinforcement Learning

Liquid AI

Full-time|On-site|San Francisco

Join Our TeamAt Liquid AI, we are not just creating AI models; we are revolutionizing the very fabric of intelligence. Originating from MIT, our objective is to develop efficient AI systems across all scales. Our Liquid Foundation Models (LFMs) excel in environments where others falter—on-device, at the edge, and under real-time constraints. We are not simply refining existing concepts; we are pioneering the future of AI.We recognize that exceptional talent drives remarkable technology. The Liquid team is a collective of elite engineers, researchers, and innovators dedicated to crafting the next generation of AI solutions. Whether you are designing model architectures, enhancing our development platforms, or facilitating enterprise integrations, your contributions will significantly influence the evolution of intelligent systems.While San Francisco and Boston are preferred locations, we welcome applicants from other regions within the United States.

Nov 7, 2025

Apply

Technical Staff Member - Inference & Reinforcement Learning Systems

Magic.dev

Full-time|On-site|San Francisco

At Magic, we are driven by our mission to develop safe Artificial General Intelligence (AGI) that propels humanity forward in addressing the most critical challenges. We firmly believe that the future of safe AGI lies in automating research and code generation, allowing us to enhance models and tackle alignment issues more effectively than humans alone can manage. Our innovative approach combines cutting-edge pre-training, domain-specific reinforcement learning (RL), ultra-long context, and efficient inference-time computation to realize this vision.Position OverviewAs a Software Engineer within the Inference & RL Systems team, you will play a pivotal role in designing and managing the distributed systems that enable our models to function seamlessly in production, supporting extensive post-training workflows.This position operates at the intersection of model execution and distributed infrastructure, focusing on systems that influence inference latency, throughput, stability, and the reliability of RL and post-training training loops.Our long-context models impose significant execution demands, including KV-cache scaling, managing memory constraints for lengthy sequences, batching strategies, long-horizon trajectory rollouts, and ensuring consistent throughput under real-world workloads. You will be responsible for the infrastructure that ensures both production inference and large-scale RL iterations are efficient and dependable.Key ResponsibilitiesCraft and scale high-performance inference serving systems.Optimize KV-cache management, batching methods, and scheduling processes.Enhance throughput and latency for long-context tasks.Develop and sustain distributed RL and post-training infrastructure.Boost reliability across rollout, evaluation, and reward pipelines.Automate fault detection and recovery mechanisms for serving and RL systems.Analyze and eliminate performance bottlenecks across GPU, networking, and storage components.Collaborate with Kernel and Research teams to ensure alignment between execution systems and model architecture.QualificationsSolid foundation in software engineering and distributed systems.Proven experience in building or managing large-scale inference or training systems.In-depth understanding of GPU execution constraints and memory trade-offs.Experience troubleshooting performance issues in production machine learning systems.Capability to analyze system-level trade-offs between latency, throughput, and cost.

Feb 28, 2026

Apply

Post-Training Technical Staff Member at Liquid AI | San Francisco

Liquid AI

Full-time|On-site|San Francisco

About Liquid AIFounded as a spin-off from MIT CSAIL, Liquid AI specializes in the development of versatile artificial intelligence systems optimized for performance across various deployment environments, ranging from data center accelerators to on-device hardware. Our focus on low latency, minimal memory consumption, privacy, and reliability allows us to partner effectively with enterprises in sectors such as consumer electronics, automotive, life sciences, and financial services. As we experience rapid growth, we are eager to welcome talented individuals who can contribute to our mission.The OpportunityThis unique position places you at the forefront of advanced foundation models and their practical applications. You will oversee post-training projects from start to finish for some of the world’s leading enterprises, while also playing a vital role in the ongoing development of Liquid’s core models.In this role, you will not have to choose between impactful customer work and foundational development; instead, you will enjoy deep involvement in both. You will have significant influence over how models are adapted, assessed, and deployed, directly contributing to the enhancement of Liquid’s post-training capabilities.If you are passionate about data integrity, evaluation processes, and ensuring that models perform effectively in real-world scenarios, this is your chance to redefine the standards of applied AI at a foundation-model company.What We're Looking ForWe seek an individual who:Takes ownership: You will lead post-training initiatives from customer requirements to delivery and evaluation.Thinks end-to-end: You will connect the dots across data generation, training, alignment, and evaluation as a cohesive system.Is pragmatic: You prioritize model quality and customer satisfaction over theoretical publications.Communicates clearly: You can interpret customer needs and effectively communicate with internal technical teams, providing constructive feedback when necessary.The WorkServe as the technical lead for post-training engagements with enterprise clients.Translate client requirements into actionable post-training specifications and workflows.Design and implement data generation, filtering, and quality assessment methodologies.Conduct supervised fine-tuning, preference alignment, and reinforcement learning processes.Create task-specific evaluations, analyze outcomes, and integrate insights back into core post-training workflows.

Jan 23, 2026

Apply

Technical Staff Member - Post-Training

Reflection AI

Full-time|On-site|SF

Our MissionAt Reflection AI, our goal is to create open superintelligence and ensure its accessibility for everyone.We are pioneering open weight models for various users, including individuals, enterprises, and even nation-states. Our talented team comprises AI researchers and industry veterans from leading organizations such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, and Anthropic.Role OverviewDevelop systems that convert robust pre-trained models into aligned and versatile agents.Lead research and engineering efforts to advance post-training practices, focusing on data curation and large-scale optimization.Create data generation frameworks, reward models, reinforcement learning algorithms, and techniques for inference-time scaling.Collaborate with both pre-training and post-training teams to achieve significant enhancements in model capabilities.Help refine our understanding of how large models learn to reason, follow instructions, and evolve through reinforcement learning.Your ProfileSolid grasp of machine learning principles with hands-on experience in large-scale LLM training.Proficient engineering skills, with the ability to navigate intricate ML codebases and distributed systems.Experience in enhancing model performance through data, reward modeling, or reinforcement learning techniques.Track record of leading ambitious research or engineering projects resulting in measurable improvements.Thrives in a dynamic, high-agency startup atmosphere; oriented towards action and clarity in execution.Ability to work seamlessly across research and infrastructure boundaries.Excellent communication skills and a collaborative mindset.Driven by a passion for pushing the boundaries of intelligence.What We Provide:At Reflection AI, we believe that to truly build open superintelligence, it must be rooted in a strong foundation. By joining us, you will contribute to building from the ground up within a compact, highly skilled team. Together, we will shape the future of our company and the landscape of open foundational models.We aim for you to accomplish the most impactful work of your career, with the assurance that you and your loved ones are well-supported.

Oct 7, 2025

Apply

Senior Staff Engineer - Reinforcement Learning Infrastructure (Cybersecurity)

Bugcrowd

Full-time|$176.4K/yr - $242.6K/yr|Remote|Remote - US

At Bugcrowd, we are redefining the landscape of cybersecurity. Since our inception in 2012, we have been committed to empowering organizations to regain control and stay ahead of cyber threats. By harnessing the collective creativity and expertise of our clients and an elite network of hackers, we leverage our patented AI-driven Security Knowledge Platform™. Our diverse community of hackers excels in uncovering vulnerabilities, swiftly adapting to the evolving threat landscape, including zero-day exploits. With our innovative CrowdMatch™ technology, we provide scalable, tailored solutions to enhance your security posture. Join us as we usher in a new era of crowdsourced security that outpaces cyber adversaries. For more information, visit www.bugcrowd.com. Headquartered in San Francisco and New Hampshire, Bugcrowd is supported by leading investors including General Catalyst, Rally Ventures, and Costanoa Ventures.Job SummaryThe Bugcrowd Reinforcement Learning and Reasoning Team is dedicated to advancing autonomous cybersecurity through the creation of authentic reinforcement learning environments tailored for foundational model applications. As a Staff Engineer, you will be at the forefront of AI Reinforcement Learning development and implementation. Your primary responsibility will be to design and build the infrastructure and tools that convert real-world vulnerability research into extensive reinforcement learning environments for training state-of-the-art AI systems.In this unique role, you will develop training environments that instruct AI systems on hacking and defending software. Your contributions will directly impact the capabilities of next-generation AI models. Rather than focusing on a single application, you will create the underlying infrastructure that generates thousands of environments for training leading-edge AI technologies.Our team operates at the intersection of AI, security research, and systems engineering, crafting environments that enable models to acquire essential skills such as vulnerability detection, exploitation, and remediation.

Mar 16, 2026

Apply

Technical Staff Member - Machine Learning

Reka

Full-time|Remote|US, UK, Remote

As a Technical Staff Member specializing in Machine Learning, you will:Engage in the complete development lifecycle of innovative large-scale deep learning models.Curate datasets, architect solutions, implement algorithms, and train and assess models to enhance our offerings.Work collaboratively with engineers and researchers to convert groundbreaking research into real-world applications.Join us at a pivotal time, take on diverse roles, and contribute to building transformative products from the ground up!

Aug 1, 2023

Apply

Technical Staff Member - Pre-Training Infrastructure

Reflection AI

Full-time|On-site|San Francisco

Our MissionAt Reflection AI, our goal is to develop open superintelligence and make it universally accessible.We are pioneering open weight models tailored for individuals, agents, enterprises, and even entire nations. Our diverse team comprises talented AI researchers and industry veterans from prestigious organizations such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic, and many more.Role OverviewConstruct and enhance distributed training systems that drive the pre-training of cutting-edge models.Collaborate with research teams to design and execute extensive training runs for foundational models.Create infrastructure that facilitates efficient training across thousands of GPUs leveraging contemporary distributed training frameworks.Enhance training throughput, stability, and efficiency for extensive model training tasks.Work closely with pre-training researchers to convert experimental concepts into scalable, production-ready training systems.Boost performance of distributed training tasks through optimization of communication, memory management, and GPU utilization.Develop and maintain training pipelines that accommodate large-scale datasets, checkpointing, and iterative experiments.Identify and resolve performance bottlenecks within distributed training systems, including model parallelism, GPU communication, and training runtime environments.Contribute to the creation of systems that promote swift experimentation and iteration on novel training methods.

Mar 24, 2026

Apply

Technical Staff Member - Post-Training (Audio Applications)

liquid-ai

Full-time|On-site|San Francisco

Join our innovative team at liquid-ai as a Member of the Technical Staff specializing in audio applications. As a post-training role, you will have the opportunity to apply your knowledge in cutting-edge audio technologies, contributing to the development of advanced machine learning solutions.This position is ideal for individuals who are eager to work in a collaborative environment and are passionate about audio technology and its applications in artificial intelligence.

Mar 30, 2026

Apply

Reinforcement Learning Environment Engineer

AfterQuery

Full-time|On-site|San Francisco

About AfterQuery AfterQuery develops training data and evaluation frameworks that leading AI labs use to improve their models. The team partners with major research institutions to build datasets and run assessments that go beyond standard benchmarks. As a post-Series A company based in San Francisco, AfterQuery values contributions from every team member. Work here directly shapes the next generation of AI models. Role Overview The Reinforcement Learning Environment Engineer designs datasets and evaluation systems that influence how advanced AI models learn and improve. This role involves close collaboration with research teams, hands-on experimentation with new data collection methods, and the creation of metrics to track model progress. Work moves from theoretical analysis to practical experiments, feeding directly into large-scale model training efforts. What You Will Do Develop data segments that expose key failure modes in sectors such as finance, software engineering, and enterprise operations. Refine reward signals for Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from Value Reinforcement (RLVR) systems. Define quantitative metrics for dataset quality, diversity, and their effects on model alignment and capability. Work closely with research teams to translate training objectives into concrete data requirements and evaluation criteria. This position is based in San Francisco.

Apr 14, 2026

Apply

Post-Training Researcher

Thinking Machines Lab

Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We strive to build a future where everyone has access to the knowledge and tools essential for making AI work effectively for their unique objectives.Our team comprises scientists, engineers, and innovators who have contributed to some of the most widely adopted AI products, including ChatGPT and Character.ai, as well as notable open-weight models like Mistral and popular open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleThe Post-Training Researcher position is pivotal to our roadmap. It serves as a crucial connection between raw model intelligence and a system that is genuinely beneficial, safe, and collaborative for human users.This role uniquely combines fundamental research with practical engineering, as we do not differentiate between these functions internally. Candidates will be expected to produce high-performance code and analyze technical reports. This position is ideal for individuals who relish both deep theoretical inquiry and hands-on experimentation, aiming to influence the foundational aspects of AI learning.Note: This position is classified as an 'evergreen role', meaning we continuously accept applications in this research domain. Given the high volume of applications, an immediate match for your skills and experience may not always be available. However, we encourage you to apply; we regularly review submissions and reach out as new opportunities arise. You are welcome to apply again after gaining more experience, but we ask that you refrain from applying more than once every six months. Additionally, specific postings for singular roles may be available for distinct projects or team needs, in which case you are welcome to apply directly in conjunction with this evergreen role.What You’ll DoDevelop and Optimize Recipes: Refine post-training recipes, encompassing various datasets, training stages, and hyperparameters, while assessing their impact on multiple performance metrics.Iterate on Evaluations: Engage in a continuous process of defining evaluation metrics, optimizing them, and recognizing their limitations. You will be accountable for enhancing performance metrics and ensuring they are meaningful.Debug and Analyze: During the fine-tuning of training configurations, you may encounter results that appear inconsistent. You will be responsible for troubleshooting and cultivating a deeper understanding to apply to subsequent challenges.Scale and Investigate: Assess and expand the capabilities of our models while exploring potential improvements.

Nov 23, 2025

Apply

Post-Training Research Scientist

Generalist

Full-time|On-site|San Francisco Bay Area (San Mateo) or Boston (Somerville)

About the RoleIn the realm of machine learning, pretraining lays the foundation for a general model, while post-training refines that model, enhancing its utility, controllability, safety, and performance in real-world applications. As a Post-Training Research Scientist, you will transform large pretrained robot models into production-ready systems through methodologies such as fine-tuning, reinforcement learning, steering, human feedback, task specialization, evaluation, and on-robot validation at scale. This position offers a unique opportunity for individuals from diverse backgrounds to evolve into full-stack ML roboticists, adept at swiftly identifying challenges across machine learning and control domains. This is where innovative research converges with practical implementation.Your Responsibilities Include:Crafting fine-tuning and adaptation strategies tailored for specific robotic tasks and embodiments.Developing methodologies to enhance reliability, robustness, and controllability of robotic systems.Establishing evaluation frameworks to assess real-world robot performance beyond just offline metrics.Collaborating with ML infrastructure teams to optimize inference-time performance, including latency, stability, and memory usage.Utilizing advanced techniques such as imitation learning, reinforcement learning, distillation, synthetic data, and curriculum learning.Bridging the gap between model outputs and tangible outcomes in the physical world.You Might Excel in This Role If You:Possess experience in fine-tuning large models for downstream applications, including RLHF, imitation learning, reinforcement learning, distillation, and domain adaptation.Have a background in embodied AI, robotics, or real-world machine learning systems.Demonstrate a strong commitment to evaluation, benchmarking, and failure analysis.Are comfortable troubleshooting and debugging across the entire ML stack, from analyzing loss curves to understanding robot behavior.Enjoy rapid iteration and thrive on real-world feedback loops.Aspire to connect foundational models with practical deployment scenarios.About GeneralistAt Generalist, we are dedicated to realizing the vision of general-purpose robots. We envision a future where industries and homes benefit from collaborative interactions between humans and machines, enabling us to achieve more than ever before. Our focus is on building embodied foundation models, starting with dexterity, and advancing the frontiers of data, models, and hardware to empower robots to intelligently engage with their environments.

Feb 12, 2026

Apply

Reinforcement Learning Environment Reviewer

Preference Model

Full-time|On-site|San Francisco

About UsAt Preference Model, we are pioneering the next generation of training data to fuel the evolution of AI technology. Although today's models demonstrate significant capabilities, they often fall short in diverse applications due to many tasks being out of distribution. We create reinforcement learning (RL) environments where models face research and engineering challenges, allowing them to iterate and learn from realistic feedback loops.Our founding team boasts experience from Anthropic’s data division, where we built data infrastructure, tokenizers, and datasets for Claude. Collaborating with leading AI labs, we aim to bring AI closer to its transformative potential, supported by a16z.About the RoleEvery RL environment we deploy must withstand a model actively attempting to exploit it. A task with a weak evaluation or an easily exploitable reward signal is counterproductive; it teaches the model to cheat instead of reason. We seek an individual dedicated to identifying these vulnerabilities before the model does.We have learned that domain knowledge alone does not make an effective reviewer. The ideal candidate is someone who has engaged in adversarial thinking: designing challenging problems that are difficult to exploit, dismantling others’ tasks, or directly researching reward hacking.Your ResponsibilitiesReview RL environments and training tasks for accuracy, robustness, and resistance to reward hacking.Identify potential ways a model could exploit grading systems, manipulate evaluation criteria, or bypass intended reasoning.Collaborate with environment authors to enhance grading systems, rectify reward signals, and redesign ineffective tasks.Develop and maintain review standards and checklists as we scale from hundreds to thousands of tasks monthly.Provide guidance on grader design during the planning phase of environments, ensuring quality before task construction.Who We Are Looking ForYou think like an attacker and have spent considerable time crafting problems that are challenging to exploit or deconstructing seemingly solid issues. A fundamental understanding of machine learning is essential, enabling you to anticipate model strategies, combined with enough engineering insight to assess whether a grader effectively tests its criteria.

Mar 18, 2026

Apply

Reinforcement Learning Environments Engineer Summer Intern

Preference Model

Internship|Remote|San Francisco

Location: Preference for San Francisco, but remote candidates are welcome to apply.Duration: This internship will last for 10-12 weeks during Summer 2026.Compensation: This is a paid internship opportunity.About UsAt Preference Model, we are pioneering the next era of training data to fuel the advancement of AI technologies. While current models are impressive, they often struggle with diverse applications due to out-of-distribution tasks. Our focus is on developing reinforcement learning (RL) environments where models can engage with complex research and engineering challenges, iterating and learning from realistic feedback mechanisms.Our founding team boasts extensive experience from Anthropic's data division, where we built data infrastructure, tokenizers, and datasets that powered Claude. We collaborate with top AI labs to accelerate AI's journey toward its transformative potential and are proudly supported by a16z.About the RoleWe are seeking talented PhD students and exceptional undergraduate candidates to join us this summer in developing RL training environments tailored for large language models.What You'll DoDesign and implement RL environments to assess LLM reasoning across various ML, systems, and research problems.Produce clean, production-quality Python code (not just notebooks).Utilize Docker to create reproducible environments and troubleshoot issues as they arise.Translate ML research papers and concepts into actionable training tasks.Who We're Looking ForYou are either an undergraduate or a PhD student in Computer Science, Machine Learning, Mathematics, Physics, or a related discipline. You have a knack for writing real code beyond mere research prototypes and you enjoy reading ML literature in your spare time.Must-Have Qualifications:Proficient in Python programming.Understanding of large language models (LLMs), their strengths, and limitations.Self-motivated and capable of taking feedback to iterate quickly.Preferred Qualifications:Familiarity with transformer architecture and experience with training or inference code.Experience in writing CUDA kernels or engaging in low-level GPU programming.Deep knowledge in a particular research area (demonstrated by publications, public code, or strong coursework).A passion for continuous learning and research in the field of AI.

Mar 18, 2026

Apply

Technical Staff Member - Applied Vision (Post Training)

Liquid AI

Full-time|On-site|San Francisco

Join Liquid AI as a Technical Staff Member specializing in Applied Vision. In this dynamic role, you will leverage cutting-edge technology to develop innovative solutions and enhance our product offerings. This position is ideal for recent graduates with a passion for technology and a desire to make a meaningful impact in the field of artificial intelligence.

Mar 30, 2026

Apply

Research Engineer with a Focus on Reinforcement Learning

firecrawl

Full-time|Hybrid|San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10)

Join firecrawl as a Research Engineer specializing in Reinforcement Learning (RL). In this role, you will leverage your expertise to conduct innovative research and develop advanced RL algorithms that push the boundaries of technology. Collaborate with a talented team of engineers and researchers to solve complex problems and contribute to groundbreaking projects.

Mar 18, 2026

Apply

Technical Staff Member - Pre-Training at ReflectionAI | San Francisco

ReflectionAI

Full-time|On-site|SF

Our VisionAt ReflectionAI, we strive to create open superintelligence and ensure its accessibility for everyone.Our team is dedicated to developing open weight models for individuals, organizations, and even nations. Our collective expertise comes from leading AI institutions such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic, and more.Role OverviewConduct research and develop solutions focusing on algorithms, scaling laws, data processing, optimizers, and model architecture.Design and execute scientific experiments to enhance our understanding of scaling large language models and improving data efficiency.Apply cutting-edge techniques from the deep learning literature to our projects.Independently lead small research initiatives while collaborating on larger projects.Enhance our training infrastructure for optimal scaling efficiency.Contribute across the entire technology stack, from low-level optimizations to high-level model design.Your ProfilePossess a graduate degree (MS or PhD) in Computer Science, Machine Learning, or a related field.Demonstrate strong software engineering skills with experience in large-scale systems development.Have prior experience with large-scale ETL processes and preparing training data.Possess a deep understanding of large-scale machine learning, specifically regarding language models, distributed training, and scaling.Be proficient in Python and familiar with deep learning frameworks, preferably PyTorch.Effectively navigate the trade-offs between research goals and practical engineering challenges.Excel in a fast-paced, high-agency startup culture with a proactive approach.Exhibit strong communication skills and a collaborative mindset.Show a passion for pushing the boundaries of intelligence.What We Provide:We believe that building truly open superintelligence starts with a solid foundation. Joining ReflectionAI means being part of a tightly-knit, highly talented team, where you will help shape our future and redefine the landscape of open foundational models.

Oct 7, 2025

Apply

Machine Learning Engineer - Imitation & Reinforcement Learning for Robotics

Bedrock Robotics

Full-time|On-site|San Francisco, CA

Be Part of the Future of Autonomous RoboticsAt Bedrock Robotics, we are pioneering the transition of AI from theoretical frameworks to practical applications in the built environment. Our team is comprised of seasoned professionals who have been instrumental in the success of innovative companies such as Waymo, Segment, and Uber Freight. We are at the forefront of deploying autonomous technologies in heavy construction machinery, significantly enhancing the efficiency and safety of multi-billion dollar infrastructure projects across the nation.With backing from $350 million in funding, our mission is to address the urgent need for housing, data centers, and manufacturing facilities, while simultaneously responding to the construction industry's labor shortages.This position is where cutting-edge algorithms meet the practical world of construction. You will work alongside industry experts and top-tier engineers to tackle complex real-world challenges that cannot be simulated. If you are eager to leverage advanced technology for impactful problem-solving within a skilled team, we encourage you to apply.

Jan 31, 2026

Apply

Reinforcement Learning Software Engineer

Preference Model

Full-time|On-site|San Francisco

About UsAt Preference Model, we are at the forefront of developing advanced training data essential for the evolution of artificial intelligence. While today's AI models exhibit significant power, they often fall short in diverse applications due to limitations in their training data. We specialize in creating reinforcement learning environments that present AI with authentic research and engineering challenges, enabling them to iterate and learn through realistic feedback loops.Our founding team boasts experience from Anthropic’s data department, where we established the data infrastructure, tokenizers, and datasets that supported Claude. We collaborate with top-tier AI research labs to bring AI closer to its groundbreaking potential and are proudly backed by a16z.About the RoleAs a Software Engineer on our team, your responsibilities will include:Designing and Developing Reinforcement Learning Environments: Architect comprehensive simulation platforms that encompass environmental context, task definitions, and reward functions to facilitate AI agents' learning and performance of intricate tasks.Building Robust Training Infrastructure: Create scalable systems for post-training AI models, focusing on orchestration, performance optimization, and monitoring capabilities.Implementing Realistic Model Evaluations: Develop metrics for evaluating AI agent performance and establish the infrastructure and tools necessary for conducting these evaluations.Influencing Technical Strategy: Take charge of architectural decisions, impact product roadmaps, and contribute significantly to our engineering culture as an early-stage team member.About YouYou might be a great fit for this role if you possess the following qualities:Adept at leveraging language models effectively.Ability to innovate and think outside the box.A minimum of 4 years of software engineering experience, showcasing your ability to take ownership of projects.Proficiency in Python, Rust, or TypeScript, with the capability to work across the entire software stack.Hands-on experience with modern deployment practices, containerization, and cloud infrastructure (such as Kubernetes, AWS, or GCP).Strong problem-solving skills demonstrated through algorithmic challenges or complex system design tasks.Nice-to-HavesPreferred candidates will have experience in:Machine learning infrastructure or reinforcement learning.

Mar 18, 2026

Apply

Technical Staff Member

Catalog

Full-time|On-site|San Francisco

At Catalog, we are pioneering the commerce infrastructure for AI—creating the essential framework that enables digital agents to not only explore the web but also comprehend, analyze, and engage with products. Our innovations drive the future of AI-driven shopping experiences, fundamentally transforming how consumers discover and purchase items online.Role OverviewAs a Technical Staff Member, you will be instrumental in developing core systems, shaping our engineering culture, and transitioning our vision from prototype to a robust platform. This role requires full-stack expertise and a commitment to owning and resolving challenges from start to finish.Who You AreYou have experience creating beloved and trusted products from the ground up.You combine technical proficiency with a keen product sense and data-driven intuition.You are well-versed in AI technologies.You prioritize speed, write clean code, and ensure thorough instrumentation.You seek a high level of ownership within a small, talent-rich team based in San Francisco.Challenges You Will TackleDevelop and deploy agentic-search APIs that deliver structured and real-time product data in milliseconds.Build checkout systems enabling agents to conduct transactions with any merchant.Create an embeddings and retrieval layer that optimizes recall, precision, and cost efficiency.Establish a product graph and ranking pipeline that adapts based on actual user outcomes.Preferred QualificationsProven experience shipping data-centric products in a live environment.Experience with recommendation systems or information retrieval methodologies.Familiarity with API development, search indexing, and data pipeline construction.Our Work CultureWe operate with a small, high-trust, and highly motivated team, fostering an environment of in-person collaboration in North Beach, San Francisco. Our process involves debate, decision-making, and execution.If your profile aligns with our needs, we will contact you to arrange 2-3 brief technical interviews, followed by an onsite meeting in our office where you will collaborate on a small project, exchange ideas, and meet the team.

Oct 15, 2025

Apply

Research Engineer - Reinforcement Learning Infrastructure

primeintellect

Full-time|On-site|San Francisco

Join primeintellect as a Research Engineer focused on Reinforcement Learning Infrastructure. In this role, you will be instrumental in advancing our cutting-edge AI technologies. You will collaborate with interdisciplinary teams to develop robust frameworks that enhance machine learning capabilities and drive innovation.As a key player in our engineering team, you will work on designing, implementing, and optimizing systems that support reinforcement learning algorithms. Your contributions will directly impact the efficiency and effectiveness of our AI solutions.

Mar 27, 2026

Create account — see all 2,238 results