Research Program Manager - Model Evaluations and Safety

Reflection AISan FranciscoNew

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Experience Level

Manager

Qualifications

The ideal candidate will have a strong background in research project management, with a proven ability to navigate complex environments and deliver results. Exceptional communication skills and the ability to work collaboratively with cross-functional teams are essential. A degree in a relevant field is preferred.

About the job

Our Mission

At Reflection AI, we are committed to creating open superintelligence that is accessible to everyone. Our team is dedicated to developing open weight models tailored for individuals, agents, enterprises, and nation states. Our diverse group of AI experts comes from prestigious organizations such as DeepMind, OpenAI, Google Brain, Meta, Character. AI, and Anthropic.

About the Role

As a Research Program Manager (RPM) at Reflection AI, you will play a pivotal role in leading and collaborating with our research and infrastructure teams to expedite the advancement of cutting-edge model development. You will not merely track projects; you will be a catalyst for clarity in uncertain situations, facilitate decision-making processes, and ensure cohesive integration across multiple teams.

This is a crucial position where you will spearhead the establishment of model evaluations and safety protocols from the ground up. You will define evaluation frameworks, construct the operational infrastructure for model safety, and create processes that seamlessly connect evaluations within the model development lifecycle. You will be laying the foundation for how Reflection AI interacts with the broader safety ecosystem. This is quintessential 0-to-1 work.

Possessing a proactive, first-responder mindset, you will take initiative to address challenges head-on, assess situations, and drive resolutions collaboratively.

What You'll Do

Develop the essential infrastructure for model evaluations and safety. Formulate evaluation frameworks, outline tooling requirements, and establish operational processes that will guide our assessment of model capabilities, risks, and readiness for deployment.
Establish model safety operations as a core function, including setting workflows, review schedules, and decision-making frameworks that link safety evaluations to the model development and release processes.
Collaborate with research and engineering leads throughout the pre-training, mid-training, and post-training phases to integrate safety and evaluation checkpoints into the development workflow in a manner that is thorough yet efficient.
Lead the scoping and prioritization of evaluation science and infrastructure investments, partnering with technical leads to determine which aspects to develop internally and which to adopt from external sources.

About Reflection AI

Reflection AI is at the forefront of AI research, dedicated to building open superintelligence. Our mission is to democratize access to advanced AI technologies, ensuring that they are available to everyone, from individuals to large organizations. Our team comprises industry leaders from top-tier tech companies, creating an innovative environment for groundbreaking AI development.

1 - 20 of 5,575 Jobs

Search for Research Engineer In Model Evaluations

5,575 results

Select all on this page (20)

Apply

Research Engineer in Model Evaluations

Anthropic

Full-time|Remote|Remote-Friendly (Travel-Required) | San Francisco, CA | New York City, NY

Anthropic is looking for a Research Engineer focused on model evaluations. This position involves research and development to assess and strengthen the performance of AI models. Teams are based in San Francisco and New York City, and the role supports remote work with required travel. Key responsibilities Design and implement evaluations for Anthropic's AI m…

Apr 28, 2026

Apply

Research Program Manager - Model Evaluations and Safety

Reflection AI

Full-time|On-site|San Francisco

Our MissionAt Reflection AI, we are committed to creating open superintelligence that is accessible to everyone. Our team is dedicated to developing open weight models tailored for individuals, agents, enterprises, and nation states. Our diverse group of AI experts comes from prestigious organizations such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, and Anthropic.About the RoleAs a Research Program Manager (RPM) at Reflection AI, you will play a pivotal role in leading and collaborating with our research and infrastructure teams to expedite the advancement of cutting-edge model development. You will not merely track projects; you will be a catalyst for clarity in uncertain situations, facilitate decision-making processes, and ensure cohesive integration across multiple teams.This is a crucial position where you will spearhead the establishment of model evaluations and safety protocols from the ground up. You will define evaluation frameworks, construct the operational infrastructure for model safety, and create processes that seamlessly connect evaluations within the model development lifecycle. You will be laying the foundation for how Reflection AI interacts with the broader safety ecosystem. This is quintessential 0-to-1 work.Possessing a proactive, first-responder mindset, you will take initiative to address challenges head-on, assess situations, and drive resolutions collaboratively.What You'll DoDevelop the essential infrastructure for model evaluations and safety. Formulate evaluation frameworks, outline tooling requirements, and establish operational processes that will guide our assessment of model capabilities, risks, and readiness for deployment.Establish model safety operations as a core function, including setting workflows, review schedules, and decision-making frameworks that link safety evaluations to the model development and release processes.Collaborate with research and engineering leads throughout the pre-training, mid-training, and post-training phases to integrate safety and evaluation checkpoints into the development workflow in a manner that is thorough yet efficient.Lead the scoping and prioritization of evaluation science and infrastructure investments, partnering with technical leads to determine which aspects to develop internally and which to adopt from external sources.

Apr 30, 2026

Apply

Research Engineer – Audio & Speech Models

Zyphra

Full-time|On-site|San Francisco

Zyphra is an innovative artificial intelligence company located in the heart of San Francisco, California.The Opportunity:Join our dynamic team as a Research Engineer - Audio & Speech Models, where you will play a pivotal role in advancing Zyphra’s Audio Team. You will be instrumental in developing cutting-edge open-source text-to-speech and audio models. Your contributions will span the full spectrum of the model training process, from data collection and processing to the design of innovative architectures and training approaches.Your Responsibilities:Conduct large-scale audio training operationsOptimize the performance of our training infrastructureCollect, process, and evaluate audio datasetsImplement architectural and methodological improvements through rigorous testingWhat We Seek:A strong research mindset with the ability to navigate projects from ideation to implementation and documentation.Proficiency in rapid prototyping and implementation, allowing for swift experimentation.Effective collaboration skills in a fast-paced research environment.A quick learner who is eager to embrace and implement new concepts.Excellent communication abilities, enabling you to contribute to both research and engineering tasks at scale.Preferred Qualifications:Expertise in training audio models, such as text-to-speech, ASR, speech-to-speech, or emotion recognition.Experience with training audio autoencoders.Solid understanding of signal processing, particularly in audio.Familiarity with diffusion models, consistency models, or GANs.Experience with large-scale (multi-node) GPU training environments.Strong understanding of experimental methodologies for conducting rigorous tests and ablations.Interest in large-scale, parallel data processing pipelines.Competence in PyTorch and Python programming.Experience contributing to large, established codebases with rapid adaptation.

Aug 28, 2025

Apply

Senior Lead, Research & Evaluation

aiedu

Full-time|On-site|San Francisco, United States

Join aiedu as a Senior Lead in Research & Evaluation, where you will drive impactful research initiatives that shape educational practices and policies. In this role, you will lead a team of researchers in designing and executing comprehensive evaluations that inform our strategic direction. Your expertise will be critical in analyzing data, generating insights, and communicating findings to stakeholders.

Mar 13, 2026

Apply

Research Engineering Manager - Model Training

Perplexity

Full-time|On-site|San Francisco

Join Perplexity as a Research Engineering Manager, where you will spearhead a team of exceptional AI researchers and engineers dedicated to crafting the advanced models that power our innovative products. Our talented team has pioneered some of the most sophisticated models in agentic research, query understanding, and other critical domains that demand precision and depth. As we broaden our user base and expand our product offerings, our proprietary models are increasingly essential for delivering a premium experience to the world's most discerning users.You will explore our extensive datasets of conversational and agentic queries, applying state-of-the-art training methodologies to enhance AI model performance. Through proactive technical and organizational leadership, you will empower your team to create cutting-edge models for the applications that are most significant to our business and our users.

Feb 4, 2026

Apply

Senior Machine Learning Engineer - Model Evaluations for Public Sector

Scale AI

Full-time|$216.3K/yr - $300.3K/yr|On-site|San Francisco, CA; St. Louis, MO; New York, NY; Washington, DC

Senior Machine Learning Engineer - Model Evaluations for the Public Sector The Public Sector Machine Learning team at Scale AI pioneers the deployment of cutting-edge AI systems, including Large Language Models (LLMs), agentic models, and comprehensive multimodal pipelines, within critical government operations. We establish robust evaluation frameworks that ensure these models function reliably, safely, and effectively in real-world scenarios. As a Senior Machine Learning Engineer, you will architect, implement, and enhance automated evaluation pipelines that empower our clients to trust and effectively utilize advanced AI systems in defense, intelligence, and federal missions. Your Responsibilities Include: Creating and maintaining automated evaluation pipelines for machine learning models, focusing on functional, performance, robustness, and safety metrics, including evaluations based on LLM judges. Designing test datasets and benchmarks to assess generalization, bias, explainability, and potential failure modes. Building evaluation frameworks for LLM agents, which includes the infrastructure for scenario-based and environment-based testing. Conducting comparative analyses of model architectures, training procedures, and evaluation results. Implementing tools for continuous monitoring, regression testing, and quality assurance of machine learning systems. Designing and executing stress tests and red-teaming workflows to identify vulnerabilities and edge cases. Collaborating with operations teams and subject matter experts to generate high-quality evaluation datasets. This position requires an active security clearance or the ability to obtain one.

Mar 26, 2026

Apply

Research Scientist, Model Architectures

Zyphra

Full-time|On-site|San Francisco

Zyphra is a cutting-edge artificial intelligence firm headquartered in the vibrant city of San Francisco, California.Position Overview:As a Research Scientist specializing in Model Architectures, you will play a pivotal role in Zyphra’s AI Architecture Research Team. Your responsibilities will include the design and thorough evaluation of innovative model architectures and training methodologies aimed at enhancing essential modeling capabilities (e.g., loss per flop or loss per parameter) and tackling core limitations inherent in current models. You will collaborate closely with our pre-training team to ensure that your findings are seamlessly integrated into our next-generation models.Qualifications:A strong research acumen and intuition.Proven ability to navigate research projects from initial conception to execution and final write-up.Exceptional implementation and prototyping skills, with the capability to swiftly transform ideas into experimental outcomes.A collaborative spirit and the ability to thrive in a fast-paced research environment.A deep curiosity and enthusiasm for understanding intelligence.Requirements:Experience with long-term memory, RAG/retrieval systems, dynamic/adaptive computation, and alternative credit assignment strategies.Knowledge of reinforcement learning, control theory, and signal processing techniques.A passion for exploring and critically evaluating unconventional ideas, with the ability to maintain a unique perspective.Familiarity with modern training pipelines and the hardware necessities for designing efficient architectures compatible with GPU hardware.Strong understanding of experimental methodologies for conducting rigorous ablations and hypothesis testing.High proficiency in PyTorch and Python programming.Ability to quickly assimilate into large pre-existing codebases and contribute effectively.Prior publication of machine learning research in reputable venues.Postgraduate degree in a scientific discipline (e.g., Computer Science, Electrical Engineering, Mathematics, Physics).Why Join Zyphra?We emphasize a structured research methodology that systematically addresses ambitious challenges in AI.

Aug 28, 2025

Apply

Research Engineer - Brain-Computer Interface Models

Zyphra

Full-time|On-site|San Francisco

Join our innovative team at Zyphra as a Research Engineer specializing in Brain-Computer Interface (BCI) Models. In this pivotal role, you will contribute to groundbreaking research and development initiatives in the field of neuroscience and artificial intelligence. Your expertise will help shape the future of communication between humans and machines, enhancing the quality of life for countless individuals.As a Research Engineer, you will be responsible for designing, implementing, and testing advanced BCI models, collaborating closely with a diverse team of scientists and engineers. Your work will play a crucial role in advancing our understanding of neural dynamics and their applications in technology.

Mar 16, 2026

Apply

Technical Staff Member - Model Evaluations

Reflection AI

Full-time|On-site|SF

Our MissionAt Reflection AI, we are dedicated to creating accessible open superintelligence for everyone.Our team is composed of top-tier AI researchers and innovators from prestigious organizations like DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic, and more. We are committed to building open weight models for individuals, enterprises, and even nation states.About the RolePerform essential comparative analyses to deepen our insights into model capabilities.Design and enhance evaluation systems and processes that establish robust feedback loops between data, evaluations, and model behavior.Create generalizable evaluation frameworks that effectively capture reasoning, alignment, and practical usefulness.Collaborate closely with pre-training, post-training, and applied teams to translate insights into tangible model improvements.Expand the boundaries of measurable metrics, utilizing synthetic evaluations, human feedback, and real-world interaction data.About YouProficient in statistical analysis and experimental design, with the ability to rigorously measure model advancements.Knowledgeable in LLM evaluation methodologies, including static benchmarks, human preference evaluations, and agentic tasks.Possess a high degree of agency and thrive in a fast-paced startup atmosphere, prioritizing impact over rigid processes.Eager to work in a pioneering lab, shaping how we measure and accelerate the development of more capable models.Collaborative, detail-oriented, and driven by the desire to create effective feedback loops that enhance model performance.What We Offer:We believe in building superintelligence that is genuinely open, starting from the ground up. Joining Reflection means you will be part of a small, talent-dense team where you will help shape our future and push the boundaries of open foundational models.You will have the opportunity to engage in the most impactful work of your career, knowing that you and your loved ones are well-supported.Competitive Compensation: Salary and equity structured to attract and retain top global talent.Health & Wellness: Comprehensive medical, dental, vision, life, and disability insurance.

Dec 17, 2025

Apply

Model Architecture Researcher

Cartesia

Full-time|On-site|*HQ - San Francisco, CA

Join Cartesia as a Model Architecture ResearcherAt Cartesia, our vision is to revolutionize AI by creating interactive intelligence that is seamlessly integrated into your daily life. Unlike current models, our goal is to develop systems capable of processing extensive streams of audio, video, and text—1 billion text tokens, 10 billion audio tokens, and 1 trillion video tokens—directly on devices.As pioneers in innovative model architectures, our founding team, which originated from the Stanford AI Lab, has developed State Space Models (SSMs)—a groundbreaking foundation for training efficient, large-scale models. Our diverse team merges deep expertise in model innovation with a design-focused engineering approach, allowing us to create and deploy state-of-the-art models and applications.Backed by leading investors such as Index Ventures, Lightspeed Venture Partners, and many others, including industry veterans and advisors, we are poised to shape the future of AI.Your ContributionIn this role, you will drive forward-thinking research in neural network architecture, focusing on alternative models like state space models, efficient transformers, and hybrid architectures.Create innovative architectures that enhance model performance, inference speed, and adaptability in various environments, from cloud infrastructures to on-device implementations.Develop advanced capabilities for models, including statefulness, long-range memory, and novel conditioning mechanisms to boost expressiveness and generalization.Analyze architectural decisions and their effects on model characteristics such as scalability, robustness, latency, and energy consumption.Create frameworks and tools to assess architectural advancements, benchmarking their performance in both research and production contexts.Collaborate with interdisciplinary teams to translate architectural insights into scalable systems that deliver real-world impact.Your QualificationsExtensive experience in architecture design with a focus on advanced models such as state space models, transformers, and RNN/CNN variants.In-depth understanding of the interplay between architectural designs and system constraints, particularly in cloud and on-device deployments.Strong proficiency in the design and evaluation of neural network architectures.

Dec 12, 2024

Apply

Research Engineer - Language Model Pre-Training

Zyphra

Full-time|On-site|San Francisco

Zyphra is an innovative leader in artificial intelligence, located in the heart of San Francisco, California.Role Overview:As a Research Engineer specializing in Language Model Pre-Training, you will play a pivotal role in defining our language model strategy through comprehensive pretraining development. Your close collaboration with our pretraining team will ensure that your insights contribute to the advancement of our next-generation models.Key Responsibilities:Conduct large-scale training runs and implement model parallelization techniques.Optimize the performance of our pretraining stack.Oversee dataset collection, processing, and evaluation.Research architecture and methodologies, including optimizer ablations.Qualifications:Demonstrated engineering prowess in developing reliable and robust systems.A quick learner with a passion for implementing innovative ideas.Exceptional communication and collaboration skills, capable of working effectively on both research and engineering implementations at scale.Preferred Skills:Profound expertise in addressing machine learning challenges and training models.Experience training on large-scale (multi-node) GPU clusters.In-depth understanding of model training pipelines, including model/data parallelism and distributed optimizers.Strong methodology for conducting rigorous ablations and hypothesis testing.Familiarity with large-scale, high-performance data processing pipelines.High proficiency in PyTorch and Python programming.Ability to navigate and understand extensive pre-existing codebases swiftly.Published research in machine learning in reputable venues is an advantage.Postgraduate degree in a relevant scientific field (Computer Science, Electrical Engineering, Mathematics, Physics).Why Join Zyphra?We value a research methodology that emphasizes thoughtful, methodical progress towards ambitious objectives. Both deep research and engineering excellence are given equal importance.Join us in an environment that fosters innovation, collaboration, and professional growth.

Aug 28, 2025

Apply

Staff Machine Learning Research Scientist - LLM Evaluations

Scale AI

Full-time|$280K/yr - $380K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY

At Scale AI, we are the premier partner for data and evaluation in the rapidly evolving field of artificial intelligence. Our commitment to advancing the assessment and benchmarking of large language models (LLMs) positions us at the forefront of AI innovation. We are dedicated to creating leading-edge LLM evaluation methodologies that set new benchmarks for model performance. Our research teams collaborate with the top AI laboratories in the industry to provide high-quality data, accelerate progress in generative AI research, and inform what excellence looks like in this domain. As a Staff Machine Learning Research Scientist on our LLM Evals team, you will spearhead the creation of novel evaluation methodologies, metrics, and benchmarks to assess the strengths and weaknesses of cutting-edge LLMs. Your work will shape our internal strategies and influence the broader AI research community, making this role essential for establishing best practices in data-driven AI development.

Mar 26, 2026

Apply

Senior Research Scientist, Reward Models

Anthropic

Remote|Remote|Remote-Friendly (Travel Required) | San Francisco, CA

Join Anthropic as a Senior Research Scientist on our Reward Models team, where you will spearhead groundbreaking research aimed at enhancing our understanding of human preferences at scale. Your innovative contributions will directly influence how our AI models, including Claude, align with human values and optimize for user needs. You will delve into the forefront of reward modeling for large language models, designing novel architectures and training methodologies for Reinforcement Learning from Human Feedback (RLHF). Your research will explore advanced evaluation techniques, including rubric-based grading, and tackle challenges such as reward hacking. Collaboration is key, as you'll work alongside teams in Finetuning, Alignment Science, and our broader research organization to ensure your findings result in tangible advancements in AI capabilities and safety. This role offers you an opportunity to address critical AI alignment challenges, leveraging cutting-edge models and substantial computational resources to further the science of safe and capable AI systems.

Jan 29, 2026

Apply

AI Researcher for Multimodal Perception Models

Tavus

Full-time|On-site|San Francisco

About TavusTavus is at the forefront of innovation in human computing. Our mission is to develop AI Humans: an advanced interface that bridges the gap between individuals and machines, eliminating the friction found in current technologies. Our state-of-the-art human simulation models empower machines to see, hear, respond, and even exhibit realistic appearances—facilitating genuine, face-to-face interactions. AI Humans integrate the emotional insight of humans with the scalability and dependability of machines, making them reliable agents accessible 24/7, in any language, on our terms.Imagine having access to an affordable therapist, a personal trainer that fits your schedule, or a team of medical assistants dedicated to providing personalized care for every patient. With Tavus, individuals, enterprises, and developers have the tools to create AI Humans that connect, comprehend, and act with empathy on a large scale.We are a Series A company supported by esteemed investors such as Sequoia Capital, Y Combinator, and Scale Venture Partners.Join us in shaping a future where machines and humans genuinely understand one another.The PositionWe are seeking an AI Researcher to join our core AI team and advance the frontiers of multimodal conversational intelligence. If you excel in dynamic environments, enjoy transforming abstract concepts into functional code, and derive motivation from pushing the boundaries of possibility, this role is designed for you.Your Responsibilities Engage in research focusing on Foundational Multimodal Models specifically in the realm of Conversational Avatars (such as Neural Avatars and Talking-Heads).Develop models for video, audio, and language sequences utilizing Autoregressive and Predictive Architectures (e.g., V-JEPA) and/or Diffusion methodologies, with a focus on temporal and sequential data rather than static images.Collaborate closely with the Applied ML team to implement your research into production systems.Remain at the forefront of multimodal learning and assist us in defining what “cutting edge” will mean in the future.Ideal Candidate ProfilePhD (or nearing completion) in a relevant field, or equivalent practical research experience.Experience in multimodal machine learning, particularly focused on conversational interfaces.

Oct 8, 2025

Apply

Research Scientist in Generative Modeling at World Labs | San Francisco

World Labs

Full-time|$250K/yr - $325K/yr|On-site|San Francisco

About World Labs: At World Labs, we create foundational world models capable of perceiving, generating, reasoning, and interacting with the 3D environment. Our mission is to unlock the full potential of AI through spatial intelligence, transforming perception into action, reasoning into insight, and imagination into creation. We believe that spatial intelligence will revolutionize storytelling, creativity, design, simulation, and immersive experiences across both virtual and physical realms. Our world-class team is driven by curiosity and passion, boasting diverse backgrounds in technology, from AI research and systems engineering to product design. This synergy fosters a tight feedback loop between our cutting-edge research and user-empowering products. Role Overview We are seeking an innovative Research Scientist specializing in generative modeling, especially diffusion models, to join our modeling team. This position is ideal for individuals with extensive expertise in applying diffusion models to images, videos, or 3D assets and scenes. While not mandatory, experience in any of the following areas will be considered a significant advantage: Large-scale model trainingResearch in 3D computer vision In this role, you will work closely with researchers, engineers, and product teams to translate advanced 3D modeling and machine learning techniques into practical applications, ensuring our technology stays at the forefront of visual innovation. This position entails substantial hands-on research and engineering work, taking projects from conception to production deployment. Key Responsibilities Design, implement, and train large-scale diffusion models for generating 3D worlds. Develop and experiment with large-scale diffusion models to introduce novel control signals, align with target aesthetic preferences, or optimize for efficient inference. Collaborate closely with research and product teams to comprehend and translate product requirements into actionable technical roadmaps. Contribute actively to all phases of model development, including data curation, experimentation, evaluation, and deployment. Continuously investigate and integrate the latest research in diffusion and generative AI. Serve as a key technical resource within the team, mentoring peers and promoting best practices in generative modeling and machine learning engineering.

Feb 18, 2026

Apply

Technical Program Manager – Adversarial Model Research

OpenAI

Full-time|Hybrid|San Francisco

Team OverviewThe Human Data team at OpenAI is at the forefront of identifying and mitigating risks associated with advanced AI systems. Our mission is to enhance model reliability and public trust by designing thorough evaluations, uncovering vulnerabilities, and collaborating closely with researchers.Role OverviewAs a Technical Program Manager, you will spearhead initiatives aimed at assessing the safety and robustness of OpenAI’s models through innovative experimentation and methodical evaluation. Your role will involve orchestrating efforts across research and engineering teams, translating ambiguous risk signals into actionable research programs that will shape the future of AI model development and deployment.We seek candidates who possess technical acumen, thrive in uncertain environments, and are passionate about pioneering the future of safe AI.This position is based in San Francisco, CA, employing a hybrid work model of three days in the office each week, with relocation assistance available for new hires.Key ResponsibilitiesLead programs that investigate unexpected model behaviors and identify potential failure modes.Convert ambiguous risk signals into clear priorities and actionable research agendas.Design and execute innovative evaluations, experiments, and red-teaming initiatives.Collaborate with research, product, and deployment teams to integrate findings into the model training and deployment pipelines.Establish repeatable systems for monitoring model performance and interpreting emerging behavior patterns.Ideal Candidate ProfileProven experience in technical program management with exceptional organizational and communication abilities.Familiarity with large language models, prompt engineering, or model evaluation methodologies.Ability to manage fast-paced, high-uncertainty projects, shaping them from inception.Creative and resourceful in developing novel methods for evaluating model behavior and performance.Skilled in coordinating effectively across both technical and non-technical stakeholders to ensure alignment and execution.About OpenAIOpenAI is a pioneering AI research and deployment company committed to ensuring that general-purpose artificial intelligence benefits all of humanity. We continually push the boundaries of AI capabilities and strive to deploy them safely through our innovative products. Our mission is to harness the extraordinary potential of AI responsibly and equitably for a better future.

Jan 26, 2026

Apply

Machine Learning Researcher in Generative Modeling

latentlabs

Full-time|On-site|San Francisco

Join latentlabs, a pioneering company at the forefront of biotechnology, as we seek a talented Machine Learning Researcher specializing in generative modeling. You will become part of a dynamic, interdisciplinary team comprising machine learning experts, protein engineers, and biologists, all committed to revolutionizing biological control and disease treatment. In this role, you will design innovative generative models aimed at creating new proteins that exhibit functionality in wet lab assays.

Feb 19, 2026

Apply

Research Scientist, Frontier Risk Evaluations

Scale AI, Inc.

Full-time|$197.4K/yr - $246.8K/yr|On-site|San Francisco, CA; New York, NY

Join Scale AI as a Research Scientist — Frontier Risk EvaluationsAt Scale AI, we are at the forefront of data and evaluation services for pioneering AI technologies. Our mission is to ensure the safe and effective deployment of AI systems by bridging the gap between advanced AI research and global policy frameworks. With the launch of Scale Labs, we are assembling a dedicated team focused on policy research to empower governments and industry leaders with scientific insights regarding AI risks and functionalities.This team addresses complex challenges in agent robustness, AI control mechanisms, and risk assessments to facilitate a comprehensive understanding of AI risks, while promoting its responsible adoption across various sectors. We are eager to welcome skilled researchers who are passionate about shaping the future of AI.As a Research Scientist specializing in Frontier Risk Evaluations, you will be responsible for designing evaluation metrics, harnesses, and datasets to assess the risks associated with cutting-edge AI systems. Your role may involve:Developing harnesses to evaluate AI models for potential security vulnerabilities and other high-risk behaviors.Collaborating with government entities and research labs to design evaluations that mitigate risks posed by advanced AI technologies.Publishing evaluation methodologies and drafting technical reports aimed at informing policymakers.

Mar 26, 2026

Apply

Generative AI Researcher - Atomistic Foundation Models

Achira

Full-time|On-site|San Francisco Office

Join Achira in shaping the future of deep learning with cutting-edge generative, representational, and simulation models for molecules and materials. Our mission is to create foundational models that render the atomistic universe understandable, predictable, and designable.Why Choose Achira?Be part of an elite, cross-disciplinary team comprising ML researchers, physicists, chemists, and engineers who are redefining atomistic simulation through expansive foundation models.Advance the integration of deep learning with the principles of nature, merging generative AI, probabilistic reasoning, and molecular physics.Engage in projects at an unparalleled scale, tackling extensive datasets, computational challenges, and ambitious goals.Take full ownership of your research journey — from ideation and architecture to training, evaluation, and deployment.Flourish in a dynamic culture that values rigor, speed, creativity, and impact over bureaucracy.Position OverviewAs a Generative AI Researcher at Achira, you will contribute to the development of foundation simulation models — large-scale systems designed to learn the structure, dynamics, and energetics of the atomistic realm. These models will unite deep representation learning, generative modeling, and sophisticated simulation techniques.Your responsibilities will include:Crafting and training state-of-the-art deep generative models — including diffusion, autoregressive, flow-based, and latent-variable architectures focused on molecules, materials, and atomic systems.Creating expressive representations of molecular and atomistic structures and dynamics utilizing equivariant graph neural networks, geometric transformers, and latent encoders that respect physical symmetries and constraints.Innovating advanced sampling and simulation techniques that blend probabilistic inference, deep learning, and reinforcement learning to facilitate efficient exploration and simulation of learned energy landscapes.Developing models that comprehend, generate, and simulate the physical world, merging reasoning, simulation, and predictive capabilities.Working collaboratively with physicists and chemists to validate models against ab initio, molecular dynamics, and experimental datasets.Rapidly prototyping, benchmarking, and iterating — converting research concepts into reusable, scalable model components across Achira’s foundation model suite.

Oct 24, 2025

Apply

Research Engineer, Evals

Intrinsic Safety

Full-time|On-site|San Francisco

Role OverviewAt Intrinsic Safety, we are pioneering the development of AI systems capable of making critical decisions in high-stakes environments such as risk investigations, fraud detection, and identity verification. Our dedicated team in San Francisco is at the forefront of tackling complex challenges where traditional AI solutions often fall short.We are in search of a Research Engineer to play a pivotal role in shaping our model evaluation strategies. You will be responsible for creating benchmarks, datasets, and evaluation frameworks that accurately assess our systems’ performance in real-world scenarios. This position bridges research, product development, and engineering, focusing on rigorous evaluations that reflect actual customer workflows and identify key failure points to propel the next generation of AI advancements.

Mar 31, 2026

Create account — see all 5,575 results

Research Program Manager - Model Evaluations and Safety

Experience Level

Qualifications

About the job

Our Mission

About the Role

What You'll Do

About Reflection AI

Similar jobs