Tech Lead/Manager, Machine Learning Research Scientist - LLM Evaluations
Scale AI, Inc.San Francisco, CA; Seattle, WA; New York, NY
On-site Full-time $280K/yr - $380K/yr
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Manager
Qualifications
Key Responsibilities:Lead a high-performing team of research scientists and engineers focused on LLM evaluations. Conduct research on the effectiveness and constraints of current LLM evaluation techniques. Design and develop innovative evaluation benchmarks for large language models, addressing areas such as instruction adherence, factual accuracy, robustness, and fairness. Foster communication and collaboration with clients and peer teams to facilitate cross-functional initiatives. Work with internal teams and external partners to refine metrics and establish standardized evaluation protocols. Implement scalable and reproducible evaluation pipelines using modern machine learning frameworks. Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives. Stay current with ongoing research within the team, assist in overcoming technical challenges, and engage in design decision-making. Maintain strong involvement in the research community, both understanding trends and influencing them. Excel in a dynamic, fast-paced startup environment and commit to driving impactful results. Desired Qualifications:5+ years of practical experience in large language models, natural language processing, and Transformer modeling, in both research and engineering contexts. A proven track record of achieving significant research impacts in a fast-paced setting. Experience in supporting and leading a team of research scientists and engineers.
About the job
As a premier data and evaluation partner for cutting-edge AI firms, Scale AI is committed to enhancing the evaluation and benchmarking of large language models (LLMs). We are developing industry-leading LLM evaluations that set new benchmarks for model performance assessment. Our mission is to create rigorous, scalable, and equitable evaluation methodologies that propel the next evolution of AI capabilities.
Our Research teams collaborate with top AI laboratories to provide high-quality data and expedite advancements in Generative AI research. As the Tech Lead/Manager of the LLM Evaluations Research team, you will guide a skilled team of research scientists and engineers dedicated to crafting and applying innovative evaluation methodologies, metrics, and benchmarks that assess the strengths and weaknesses of our advanced LLMs. This pivotal role involves designing and executing a strategic roadmap that establishes best practices in data-driven AI development, thus accelerating the development of the next generation of generative AI models in collaboration with leading foundational model labs.
About Scale AI, Inc.
Scale AI is the leading evaluation partner for advanced AI companies, focused on enhancing the benchmarking and assessment of large language models through innovative methodologies and collaboration with top research labs.
About Retell AI Retell AI builds voice AI technology that helps businesses transform their call center operations. In just 18 months, thousands of companies have adopted Retell’s AI voice agents to streamline sales, support, and logistics, work that once required large human teams. Backed by investors including Y Combinator and Alt Capital, Retell has grown annual recurring revenue from $5M to $36M with a focused team of 20. The company’s goal for 2026: a modern customer experience platform where AI powers entire contact centers. Retell is developing AI “workers” that can serve as frontline agents, quality assurance analysts, and managers, handling, evaluating, and improving customer interactions on their own. Named a top 50 AI app by a16z: https://tinyurl.com/5853dt2x Ranked #4 on Brex’s Fast-Growing Software Vendors of 2025: https://www.brex.com/journal/brex-benchmark-december-2025 Featured on the Lean AI Leaderboard: https://leanaileaderboard.com/ Role Overview: Research Scientist – LLM Retell AI is hiring a Research Scientist focused on large language models (LLMs) and audio processing. This role suits machine learning researchers who want to push the boundaries of real-time AI and see their work in production. What You Will Do Investigate new approaches in large language models and audio processing for human-like voice agents Design and implement evaluation methods for complex, real-world conversational systems Prototype systems to improve reasoning, reduce latency, and enhance conversation quality Work closely with engineering and product teams to bring research advances into production Impact Research at Retell directly shapes the capabilities of voice AI agents for thousands of businesses. The work blends advanced research with practical deployment, improving how customers interact with automated systems across industries. Location This position is based in the San Francisco Bay Area.
Full-time|$176K/yr - $304K/yr|Hybrid|Cambridge, MA USA; San Francisco, CA USA
Your Contribution at LilaAs a Machine Learning Research Scientist I/II specializing in LLM Inference, you will spearhead research initiatives focused on the training and deployment of large language models for scientific applications.Your ResponsibilitiesDevelop and refine post-training strategies for LLMs, including Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Reinforcement Learning with verifiers.Design efficient inference mechanisms and compute strategies for complex tool utilization in various environments.Create scalable evaluation metrics to assess LLM performance in scientific reasoning tasks.Investigate the boundaries of cutting-edge LLM methodologies for scientific challenges and analyze their limitations.
Full-time|$280K/yr - $380K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
At Scale AI, we are the premier partner for data and evaluation in the rapidly evolving field of artificial intelligence. Our commitment to advancing the assessment and benchmarking of large language models (LLMs) positions us at the forefront of AI innovation. We are dedicated to creating leading-edge LLM evaluation methodologies that set new benchmarks for model performance. Our research teams collaborate with the top AI laboratories in the industry to provide high-quality data, accelerate progress in generative AI research, and inform what excellence looks like in this domain. As a Staff Machine Learning Research Scientist on our LLM Evals team, you will spearhead the creation of novel evaluation methodologies, metrics, and benchmarks to assess the strengths and weaknesses of cutting-edge LLMs. Your work will shape our internal strategies and influence the broader AI research community, making this role essential for establishing best practices in data-driven AI development.
Full-time|$280K/yr - $380K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
As a premier data and evaluation partner for cutting-edge AI firms, Scale AI is committed to enhancing the evaluation and benchmarking of large language models (LLMs). We are developing industry-leading LLM evaluations that set new benchmarks for model performance assessment. Our mission is to create rigorous, scalable, and equitable evaluation methodologies that propel the next evolution of AI capabilities.Our Research teams collaborate with top AI laboratories to provide high-quality data and expedite advancements in Generative AI research. As the Tech Lead/Manager of the LLM Evaluations Research team, you will guide a skilled team of research scientists and engineers dedicated to crafting and applying innovative evaluation methodologies, metrics, and benchmarks that assess the strengths and weaknesses of our advanced LLMs. This pivotal role involves designing and executing a strategic roadmap that establishes best practices in data-driven AI development, thus accelerating the development of the next generation of generative AI models in collaboration with leading foundational model labs.
Internship|$54K/yr - $60K/yr|On-site|San Francisco, California
Company Overview: At Databricks, we are dedicated to empowering data teams to tackle some of the world’s most challenging issues, ranging from security threat detection to breakthroughs in cancer drug development. We achieve this by creating and operating the premier data and AI platform, allowing our clients to concentrate on the high-value challenges central to their missions. The Mosaic AI division equips organizations to develop AI models and systems utilizing their own data, utilizing technologies that span from fine-tuning large language models (LLMs) for specific enterprise domains to building complex AI systems that incorporate retrieval and agents. We believe that a company's AI models are as valuable as any other intellectual property and that high-quality AI models should be accessible to all. Job Summary: Our research team is focused on advancing the boundaries of “domain adaptation” — discovering how to create LLMs and AI systems that excel in specialized domains. We are investigating open research challenges across a variety of themes, including scaling and automating evaluation, fine-tuning with synthetic data, retrieval augmentation, and optimizing inference speed and efficiency. As a PhD GenAI Research Scientist Intern, you will collaborate with our research team on projects that aim to adapt LLMs and AI systems for enterprise settings. Your tasks may include: Enhancing, refining, and assessing methodologies from existing literature. Designing novel approaches for effective domain adaptation. Combining various methods to formulate innovative strategies for efficient post-training. Conducting evaluations of LLMs and AI systems.
Full-time|$252K/yr - $315K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
At Scale AI, we collaborate with leading AI laboratories to supply high-quality data and foster advancements in Generative AI research. We seek innovative Research Scientists and Research Engineers with a strong focus on post-training techniques for Large Language Models (LLMs), including Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and reward modeling. This position emphasizes optimizing data curation and evaluation processes to boost LLM performance across text and multimodal formats. In this pivotal role, you will pioneer new methods to enhance the alignment and generalization of extensive generative models. You will work closely with fellow researchers and engineers to establish best practices in data-driven AI development. Additionally, you will collaborate with top foundation model labs, providing critical technical and strategic insights for the evolution of next-generation generative AI models.
Bland Inc. seeks a Machine Learning Researcher specializing in Multimodal Large Language Models (LLMs) to join the team in San Francisco. The focus is on advancing AI systems that integrate language with other types of data. Role overview This position centers on research and development aimed at improving how AI models process and understand information from multiple sources, such as text combined with images or other modalities. What you will do Investigate how language interacts with additional data types within multimodal LLMs Create and evaluate new methods to enhance AI model performance Work closely with colleagues on projects designed to push the boundaries of machine learning Location This role is based in San Francisco.
Full-time|$275K/yr - $350K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
About Scale AI At Scale AI, we are dedicated to propelling the advancement of AI applications. Over the past eight years, we have established ourselves as the premier AI data foundry, supporting groundbreaking innovations in fields such as generative AI, defense technologies, and autonomous vehicles. Following our recent Series F funding round, we are intensifying our efforts to harness frontier data, paving the way toward achieving Artificial General Intelligence (AGI). Our work with enterprise clients and governments has enhanced our model evaluation capabilities, allowing us to expand our offerings for both public and private evaluations. About the ACE Team The Agent Capabilities & Environments (ACE) team, a vital part of Scale’s Research organization, unites customer-focused Researchers and Applied AI Engineers. Our primary mission is to conduct research on agent environments and reinforcement learning reward signals, benchmark autonomous agent performance in real-world contexts, and develop robust data programs aimed at enhancing the capabilities of Large Language Models (LLMs). We are committed to creating foundational tools and frameworks for evaluating models as agents, focusing on autonomous agents that interact dynamically with a wide range of external environments, including code repositories and GUI interfaces. About This Role This position sits at the cutting edge of AI research and its practical applications, concentrating on the data types necessary for the development of state-of-the-art agents, including browser and software engineering agents. The ideal candidate will investigate the data landscape required to propel intelligent and adaptable AI agents, steering the data strategy at Scale to foster innovation. This role demands not only expertise in LLM agents and planning algorithms but also creative problem-solving skills to tackle novel challenges pertaining to data, interaction, and evaluation. You will contribute to influential research publications on agents, collaborate with customer researchers, and partner with the engineering team to transform these advancements into scalable real-world solutions.
Join OpenAI as a Research Scientist and explore cutting-edge machine learning innovations. In this role, you will be at the forefront of developing groundbreaking techniques while advancing our team's research initiatives. Collaborate with talented peers across various teams to discover transformative ideas that scale effectively. We seek individuals who are passionate about pushing the boundaries of AI and want to contribute to our unified research vision.
Merge Labs is an innovative research facility dedicated to merging biological sciences and artificial intelligence to enhance human capability, autonomy, and experience. Our mission is to pioneer revolutionary methodologies in brain-computer interfaces that facilitate high-bandwidth interactions with the brain, seamlessly integrate advanced AI, and maintain safety and accessibility for all users.About the TeamAt Merge, we are addressing some of the most ambitious challenges in molecular engineering, synthetic biology, and neuroscience. Our Research Platform Team is responsible for creating the experimental frameworks necessary to tackle these challenges with exceptional speed and precision. The tools and methodologies developed by our team significantly enhance molecular assembly, protein expression, mammalian cell culture, advanced microscopy, sequencing, and unique custom techniques. We collaborate with program teams to establish and optimize these capabilities, implement automation where beneficial, and integrate with our data science and machine learning pipelines, continuously pushing the boundaries of throughput and innovation.About the RoleAs a Platform Scientist, you will be instrumental in developing high-efficiency and high-throughput experimental pipelines that accelerate research initiatives. You will work closely with program leads, project scientists, data scientists, and engineers, leading your work and potentially recruiting additional team members as necessary.Key Responsibilities:Collaborate with program leads and scientists to identify critical experimental requirements and workflows.Develop processes to facilitate high-throughput and/or high-efficiency experiments, including reagent production and analysis.Scope, procure, construct, program, and validate instruments to support experimental workflows.Ensure the quality, reliability, and integrity of data generated from automated pipelines, including defining and implementing suitable quality control checkpoints.Work alongside data science and machine learning engineers to incorporate metadata tracking, computational design, and analysis into experimental pipelines.Partner with electrical, mechanical, and software engineers to create custom setups.Innovate and validate concepts to enhance experimental throughput.
OverviewBecome an integral part of our dynamic R&D team dedicated to developing fully automated research systems that push the boundaries of AI. Zochi has achieved a milestone by publishing the first entirely AI-generated A* conference paper. Locus has set a new industry standard as the first AI system to surpass human experts in AI R&D.Key ResponsibilitiesConceptualize and develop innovative architectures for automated research.Work collaboratively within a specialized team of researchers addressing cutting-edge challenges in long-horizon agentic capabilities, post-training for open-ended objectives, and environment crafting.Document and publish key internal findings alongside success stories from external collaborations.QualificationsPhD or equivalent research experience in Computer Science, Machine Learning, Artificial Intelligence, or a related discipline. Outstanding candidates with significant research contributions are encouraged to apply, regardless of formal qualifications.Demonstrated history of impactful AI/ML research contributions in academic or corporate environments.Expertise in developing long-horizon, multi-agent systems and/or model post-training, especially in scientific domains or for open-ended discovery objectives.A strong passion for advancing problem-solving processes and scientific discovery, thriving in high-autonomy roles and environments.Our CultureCompetitive compensation and equity options.Unlimited Paid Time Off (PTO), emphasizing team collaboration and a community-focused workplace.Opportunities for conference participation and engagement in community initiatives.Empowered roles with high levels of responsibility.#1: We are a small, passionate team of leading investors, researchers, and industry experts committed to the mission of accelerating discovery. Join us.
About the TeamJoin the innovative Post-Training team at OpenAI, where we focus on refining and elevating pre-trained models for deployment in ChatGPT, our API, and future products. Collaborating closely with various research and product teams, we conduct crucial research that prepares our models for real-world deployment to millions of users, ensuring they are safe, efficient, and reliable.About the RoleAs a Research Engineer / Scientist, you will spearhead the research and development of enhancements to our models. Our work intersects reinforcement learning and product development, aiming to create cutting-edge solutions.We seek passionate individuals with robust machine learning engineering skills and research experience, particularly with innovative and powerful models. The ideal candidate will be driven by a commitment to product-oriented research.This position is located in San Francisco, CA, and follows a hybrid work model requiring three days in the office each week. Relocation assistance is available for new employees.In this role, you will:Lead and execute a research agenda aimed at enhancing model capabilities and performance.Work collaboratively with research and product teams to empower customers to optimize their models.Develop robust evaluation frameworks to monitor and assess modeling advancements.Design, implement, test, and debug code across our research stack.You may excel in this role if you:Possess a deep understanding of machine learning and its applications.Have experience with relevant models and methodologies for evaluating model improvements.Are adept at navigating large ML codebases for debugging purposes.Thrive in a fast-paced and technically intricate environment.About OpenAIOpenAI is a pioneering AI research and deployment organization dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We are committed to pushing the boundaries of AI capabilities while prioritizing safety and human-centric values in our products. Our mission is to embrace diverse perspectives, voices, and experiences that represent the full spectrum of humanity, as we strive for a future where AI is a powerful ally for everyone.
About Our TeamJoin the forefront of AI innovation with the RL and Reasoning team at OpenAI. Our team is dedicated to advancing reinforcement learning research and has pioneered transformative projects, including o1 and o3. We are committed to pushing the limits of generative models while ensuring their scalable deployment.About the RoleAs a Research Engineer/Research Scientist at OpenAI, you will play a pivotal role in enhancing AI alignment and capabilities through state-of-the-art reinforcement learning techniques. Your contributions will be essential in training intelligent, aligned, and versatile agents that power various AI models.We seek individuals with a solid foundation in reinforcement learning research, agile coding skills, and a passion for rapid iteration.This position is located in San Francisco, CA, and follows a hybrid work model of three days in the office per week. We also provide relocation assistance for new hires.You may excel in this role if:You are enthusiastic about being at the cutting edge of RL and language model research.You take initiative, owning ideas and driving them to fruition.You value principled methodologies, conducting simple experiments in controlled environments to draw trustworthy conclusions.You thrive in a fast-paced, complex technical environment where rapid iteration is essential.You are adept at navigating extensive ML codebases to troubleshoot and enhance them.You possess a profound understanding of machine learning and its applications.About OpenAIOpenAI is a pioneering AI research and deployment organization committed to ensuring that general-purpose artificial intelligence serves the greater good for humanity. We strive to push the boundaries of AI system capabilities while prioritizing safe deployment through our innovative products. We recognize AI as a powerful tool that must be developed with safety and human-centric principles, embracing diverse perspectives to reflect the full spectrum of humanity.We are proud to be an equal opportunity employer, welcoming applicants from all backgrounds without discrimination based on race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or any other legally protected characteristic.
Join Our Team as a Research ScientistAt Parallel, we are at the forefront of web infrastructure innovation, enabling businesses across sectors such as sales, marketing, insurance, and technology to harness the power of AI. Our state-of-the-art products empower users to develop superior AI agents with seamless and flexible access to the web.With significant backing of $130 million from prominent investors like Kleiner Perkins, Index Ventures, and Spark Capital, we are dedicated to redefining the web for artificial intelligence. As we expand, we're assembling a top-tier team of engineers, designers, marketers, sales experts, researchers, and operational specialists committed to our vision.Your Role: As a Research Scientist, you will tackle the challenge of training and scaling models designed to enhance web indexing capabilities.About You: You possess a profound understanding of contemporary models and training methodologies. You enjoy engaging in discussions about the convergence of search, recommendations, and transformer models, and are passionate about translating your research into impactful products and systems utilized by millions.
Zyphra is a pioneering artificial intelligence firm located in the vibrant city of San Francisco, California.About the Role:We are seeking a passionate Research Scientist to join our dynamic Agency and Reasoning Team at Zyphra. In this role, you will conduct cutting-edge research in reinforcement learning, post-training methodologies, and human preference learning. Your innovative ideas will be instrumental in shaping our next-generation language models, enabling their application on a large scale.What We Desire:A strong sense of research intuition and tasteCapability to navigate a research project from initial concept to execution and documentationProficiency in implementation and prototypingA quick thinker who can rapidly transform ideas into experimental frameworksAbility to collaborate effectively in a fast-paced research environmentAn insatiable curiosity and enthusiasm for the study of intelligence.Qualifications:Proven experience and skill in reinforcement learning, particularly in the context of language model reasoning or classical RL tasksFamiliarity with language-model-supervised fine-tuning and preference-learning techniques, such as DPO and simPO.Experience with methods for context-length extensionStrong intuitive understanding of model behaviors, with the ability to refine them through iterative fine-tuningInterest in engaging deeply with data and dedicating time to data engineering and synthetic data generationA postgraduate degree in a scientific discipline (Computer Science, Electrical Engineering, Mathematics, Physics)Published research in reputable machine learning venuesExpertise in PyTorch and PythonEagerness and aptitude for rapidly acquiring new knowledge and implementing innovative conceptsExceptional communication and teamwork abilities, capable of contributing to both research and large-scale engineering effortsWhy Join Zyphra?We champion creative and unconventional ideas and are prepared to invest significantly in innovative concepts.Our culture fosters collaboration, curiosity, and intellectual growth.
Zyphra is a cutting-edge artificial intelligence firm headquartered in the vibrant city of San Francisco, California.Position Overview:As a Research Scientist specializing in Model Architectures, you will play a pivotal role in Zyphra’s AI Architecture Research Team. Your responsibilities will include the design and thorough evaluation of innovative model architectures and training methodologies aimed at enhancing essential modeling capabilities (e.g., loss per flop or loss per parameter) and tackling core limitations inherent in current models. You will collaborate closely with our pre-training team to ensure that your findings are seamlessly integrated into our next-generation models.Qualifications:A strong research acumen and intuition.Proven ability to navigate research projects from initial conception to execution and final write-up.Exceptional implementation and prototyping skills, with the capability to swiftly transform ideas into experimental outcomes.A collaborative spirit and the ability to thrive in a fast-paced research environment.A deep curiosity and enthusiasm for understanding intelligence.Requirements:Experience with long-term memory, RAG/retrieval systems, dynamic/adaptive computation, and alternative credit assignment strategies.Knowledge of reinforcement learning, control theory, and signal processing techniques.A passion for exploring and critically evaluating unconventional ideas, with the ability to maintain a unique perspective.Familiarity with modern training pipelines and the hardware necessities for designing efficient architectures compatible with GPU hardware.Strong understanding of experimental methodologies for conducting rigorous ablations and hypothesis testing.High proficiency in PyTorch and Python programming.Ability to quickly assimilate into large pre-existing codebases and contribute effectively.Prior publication of machine learning research in reputable venues.Postgraduate degree in a scientific discipline (e.g., Computer Science, Electrical Engineering, Mathematics, Physics).Why Join Zyphra?We emphasize a structured research methodology that systematically addresses ambitious challenges in AI.
About AfterQuery AfterQuery partners with leading AI labs to advance training data and evaluation frameworks. The team builds high-signal datasets and runs thorough evaluations that go beyond standard benchmarks. As a post-Series A, early-stage company in San Francisco, AfterQuery gives each team member room to shape the future of AI models. Role Overview: Research Scientist - Frontier Data This role focuses on designing datasets and developing evaluation systems that influence how top AI models are trained and assessed. Working closely with research teams at major AI labs, the scientist explores new data collection techniques, investigates where models fall short, and sets up metrics to track progress. The work is hands-on and experimental, moving quickly from hypothesis to live testing and directly impacting large-scale model training. Key Responsibilities Design data slides and analyze data structures to uncover model weaknesses in areas like finance, software development, and enterprise operations. Build and refine evaluation rubrics and reward signals for RLHF and RLVR training approaches. Study annotator behavior and run experiments to improve model capabilities across different domains. Develop quantitative frameworks to measure dataset quality, diversity, and their effect on model alignment and performance. Work with research teams to turn training objectives into concrete data and evaluation needs. What We Look For Experience as an undergraduate or master’s research student (PhD not required). Background or internships with RL environments or AI safety and benchmarking organizations (e.g., METR, Artificial Analysis) is a strong plus. Genuine interest in how data structure, selection, and quality affect model outcomes. Demonstrated skill in designing experiments, acting quickly, and extracting insights from complex data. Comfort working across sectors such as finance, software engineering, and policy. Strong quantitative background and familiarity with LLM training pipelines, RLHF/RLVR methods, or evaluation frameworks. A hands-on mindset focused on building practical solutions.
About Our TeamJoin the Foundations Research team, where we tackle ambitious and innovative projects that could redefine the future of AI. Our mission is to enhance the science behind our training and scaling initiatives, focusing on pioneering frontier models. We are dedicated to advancing data utilization, scaling methodologies, optimization strategies, model architectures, and efficiency enhancements to accelerate our scientific breakthroughs.About the PositionWe are on the lookout for a dynamic technical research lead to spearhead our embeddings-focused retrieval initiatives. You will oversee a talented team of research scientists and engineers committed to developing foundational technologies that enable models to access and utilize the right information precisely when needed. This includes crafting innovative embedding training objectives, architecting scalable vector storage, and implementing adaptive indexing techniques.This pivotal role will contribute to various OpenAI products and internal research initiatives, offering opportunities for scientific publication and significant technical influence.This position is located in San Francisco, CA, where we embrace a hybrid work model, requiring three days in the office weekly, and we provide relocation assistance for new hires.Your ResponsibilitiesLead cutting-edge research on embedding models and retrieval systems optimized for grounding, relevance, and adaptive reasoning.Supervise a team of researchers and engineers in building an end-to-end infrastructure for training, evaluating, and integrating embeddings into advanced models.Drive advancements in dense, sparse, and hybrid representation techniques, metric learning, and retrieval systems.Work collaboratively with Pretraining, Inference, and other Research teams to seamlessly integrate retrieval throughout the model lifecycle.Contribute to OpenAI's ambitious vision of developing AI systems with robust memory and knowledge access capabilities rooted in learned representations.You Will Excel in This Role If You PossessA proven track record of leading high-performance teams of researchers or engineers within ML infrastructure or foundational research.In-depth technical knowledge in representation learning, embedding models, or vector retrieval systems.Familiarity with transformer-based large language models and their interaction with embedding spaces and objectives.Research experience in areas such as contrastive learning and retrieval-augmented generation.
Join the Center for AI Safety (CAIS), a pioneering research and advocacy organization dedicated to addressing the societal-scale risks posed by artificial intelligence. We tackle the most pressing challenges in AI through rigorous technical research, innovative field-building initiatives, and proactive policy engagement, in collaboration with our sister organization, the Center for AI Safety Action Fund.As a Research Scientist, you will spearhead and conduct transformative research aimed at enhancing the safety and dependability of cutting-edge AI systems. Your responsibilities will include designing and executing experiments on large language models, developing the necessary tools for training and evaluating models at scale, and converting your findings into publishable research. You will work closely with CAIS researchers and external partners from academia and industry, utilizing our compute cluster for large-scale model training and evaluation. Your research will focus on critical areas such as AI honesty, robustness, transparency, and the detection of trojan/backdoor behaviors, all aimed at mitigating real-world risks associated with advanced AI technologies.
Full-time|$150K/yr - $275K/yr|On-site|San Francisco
AI Research ScientistAt Substrate, we are tackling a critical technological challenge that impacts the United States. Positioned at the crossroads of advanced manufacturing and innovative physics, our mission is to develop transformative technologies that will revolutionize the semiconductor industry and bolster America's technological dominance. Our team comprises top-tier scientists, engineers, and technical specialists dedicated to pushing the boundaries of technology for the benefit of the nation.As an AI Research Scientist, you will play a key role in enhancing and accelerating research and development processes by harnessing machine learning techniques for scientific simulations and modeling. You will also focus on establishing internal AI capabilities throughout our organization. This position merges cutting-edge physics with artificial intelligence, requiring hands-on development of AI-enhanced tools that facilitate groundbreaking research. You will also contribute to building the infrastructure and expertise required for our technical teams to effectively use AI in their workflows. Whether you are a physicist who has adopted machine learning or an AI expert with a solid scientific background, you will be instrumental in shaping our approach to utilizing AI to expedite our internal R&D efforts.
Oct 28, 2025
Sign in to browse more jobs
Create account — see all 760 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.