Staff Machine Learning Research Scientist - LLM Evaluations
Scale AISan Francisco, CA; Seattle, WA; New York, NY
On-site Full-time $280K/yr - $380K/yr
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Mid to Senior
Qualifications
You will: Lead investigations into the effectiveness and limitations of current LLM evaluation techniques. Design and implement innovative evaluation benchmarks for large language models, focusing on instruction adherence, factual accuracy, robustness, and fairness. Build and maintain strong relationships with clients and cross-functional teams to drive collaborative projects. Work alongside internal teams and external partners to refine evaluation metrics and develop standardized protocols. Create scalable and reproducible evaluation pipelines utilizing modern machine learning frameworks. Publish findings in prestigious AI conferences and contribute to open-source benchmarking efforts. Mentor and lead research scientists and engineers, providing technical guidance across various projects. Engage actively with the ML research community to stay updated on emerging developments and contribute to the advancement of LLM evaluation science. Excel in a dynamic, fast-paced startup environment and commit to achieving impactful results.
About the job
At Scale AI, we are the premier partner for data and evaluation in the rapidly evolving field of artificial intelligence. Our commitment to advancing the assessment and benchmarking of large language models (LLMs) positions us at the forefront of AI innovation. We are dedicated to creating leading-edge LLM evaluation methodologies that set new benchmarks for model performance.
Our research teams collaborate with the top AI laboratories in the industry to provide high-quality data, accelerate progress in generative AI research, and inform what excellence looks like in this domain. As a Staff Machine Learning Research Scientist on our LLM Evals team, you will spearhead the creation of novel evaluation methodologies, metrics, and benchmarks to assess the strengths and weaknesses of cutting-edge LLMs. Your work will shape our internal strategies and influence the broader AI research community, making this role essential for establishing best practices in data-driven AI development.
About Scale AI
Scale AI is recognized as a leader in providing data and evaluation solutions for next-generation AI technologies. Our mission is to enhance the evaluation and benchmarking of large language models, ensuring fairness, scalability, and rigor in assessment methodologies.
Similar jobs
1 - 20 of 892 Jobs
Search for Research Scientist Frontier Risk Evaluations
Full-time|$197.4K/yr - $246.8K/yr|On-site|San Francisco, CA; New York, NY
Join Scale AI as a Research Scientist — Frontier Risk EvaluationsAt Scale AI, we are at the forefront of data and evaluation services for pioneering AI technologies. Our mission is to ensure the safe and effective deployment of AI systems by bridging the gap between advanced AI research and global policy frameworks. With the launch of Scale Labs, we are assembling a dedicated team focused on policy research to empower governments and industry leaders with scientific insights regarding AI risks and functionalities.This team addresses complex challenges in agent robustness, AI control mechanisms, and risk assessments to facilitate a comprehensive understanding of AI risks, while promoting its responsible adoption across various sectors. We are eager to welcome skilled researchers who are passionate about shaping the future of AI.As a Research Scientist specializing in Frontier Risk Evaluations, you will be responsible for designing evaluation metrics, harnesses, and datasets to assess the risks associated with cutting-edge AI systems. Your role may involve:Developing harnesses to evaluate AI models for potential security vulnerabilities and other high-risk behaviors.Collaborating with government entities and research labs to design evaluations that mitigate risks posed by advanced AI technologies.Publishing evaluation methodologies and drafting technical reports aimed at informing policymakers.
About AfterQuery AfterQuery partners with leading AI labs to advance training data and evaluation frameworks. The team builds high-signal datasets and runs thorough evaluations that go beyond standard benchmarks. As a post-Series A, early-stage company in San Francisco, AfterQuery gives each team member room to shape the future of AI models. Role Overview: Research Scientist - Frontier Data This role focuses on designing datasets and developing evaluation systems that influence how top AI models are trained and assessed. Working closely with research teams at major AI labs, the scientist explores new data collection techniques, investigates where models fall short, and sets up metrics to track progress. The work is hands-on and experimental, moving quickly from hypothesis to live testing and directly impacting large-scale model training. Key Responsibilities Design data slides and analyze data structures to uncover model weaknesses in areas like finance, software development, and enterprise operations. Build and refine evaluation rubrics and reward signals for RLHF and RLVR training approaches. Study annotator behavior and run experiments to improve model capabilities across different domains. Develop quantitative frameworks to measure dataset quality, diversity, and their effect on model alignment and performance. Work with research teams to turn training objectives into concrete data and evaluation needs. What We Look For Experience as an undergraduate or master’s research student (PhD not required). Background or internships with RL environments or AI safety and benchmarking organizations (e.g., METR, Artificial Analysis) is a strong plus. Genuine interest in how data structure, selection, and quality affect model outcomes. Demonstrated skill in designing experiments, acting quickly, and extracting insights from complex data. Comfort working across sectors such as finance, software engineering, and policy. Strong quantitative background and familiarity with LLM training pipelines, RLHF/RLVR methods, or evaluation frameworks. A hands-on mindset focused on building practical solutions.
Full-time|$280K/yr - $380K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
At Scale AI, we are the premier partner for data and evaluation in the rapidly evolving field of artificial intelligence. Our commitment to advancing the assessment and benchmarking of large language models (LLMs) positions us at the forefront of AI innovation. We are dedicated to creating leading-edge LLM evaluation methodologies that set new benchmarks for model performance. Our research teams collaborate with the top AI laboratories in the industry to provide high-quality data, accelerate progress in generative AI research, and inform what excellence looks like in this domain. As a Staff Machine Learning Research Scientist on our LLM Evals team, you will spearhead the creation of novel evaluation methodologies, metrics, and benchmarks to assess the strengths and weaknesses of cutting-edge LLMs. Your work will shape our internal strategies and influence the broader AI research community, making this role essential for establishing best practices in data-driven AI development.
Full-time|$280K/yr - $380K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
As a premier data and evaluation partner for cutting-edge AI firms, Scale AI is committed to enhancing the evaluation and benchmarking of large language models (LLMs). We are developing industry-leading LLM evaluations that set new benchmarks for model performance assessment. Our mission is to create rigorous, scalable, and equitable evaluation methodologies that propel the next evolution of AI capabilities.Our Research teams collaborate with top AI laboratories to provide high-quality data and expedite advancements in Generative AI research. As the Tech Lead/Manager of the LLM Evaluations Research team, you will guide a skilled team of research scientists and engineers dedicated to crafting and applying innovative evaluation methodologies, metrics, and benchmarks that assess the strengths and weaknesses of our advanced LLMs. This pivotal role involves designing and executing a strategic roadmap that establishes best practices in data-driven AI development, thus accelerating the development of the next generation of generative AI models in collaboration with leading foundational model labs.
About Our TeamThe Safety Systems organization at OpenAI is dedicated to ensuring that our most advanced AI models are developed and deployed in a responsible manner. We engineer evaluations, safeguards, and safety frameworks to help our models operate as intended in real-world applications.The Preparedness team plays a crucial role within the Safety Systems organization, guided by OpenAI’s Preparedness Framework.While frontier AI models have the potential to benefit humanity, they also introduce significant risks. The Preparedness team is essential in anticipating and preparing for catastrophic risks associated with advanced AI models to ensure that AI fosters positive change.Our mission includes:Monitoring and predicting the evolving capabilities of frontier AI systems, particularly regarding risks that could have catastrophic consequences.Establishing concrete procedures, infrastructure, and partnerships to effectively mitigate these risks and safely manage the development of powerful AI systems.Preparedness integrates capability assessment, evaluations, internal red teaming, and mitigations for frontier models, along with overall coordination on AGI preparedness. This fast-paced and impactful work holds significant importance for both the company and society.About the RoleAs models become increasingly capable—transitioning from tools that assist humans to agents that can autonomously plan, execute, and adapt in the real world—cybersecurity emerges as a critical frontier. The same systems that boost productivity can also lead to increased exploitation.In the role of Researcher focusing on cybersecurity risks, you will be instrumental in designing and implementing a comprehensive mitigation strategy to address severe cyber misuse across OpenAI’s products. This position demands strong technical expertise and extensive collaboration across teams to ensure that safeguards are enforceable, scalable, and effective. You will contribute to the development of robust protections that evolve alongside our products, model capabilities, and attacker behaviors.Key ResponsibilitiesDevelop and implement mitigation strategies for model-enabled cybersecurity threats.Collaborate with cross-functional teams to ensure effective risk management.Continuously assess and iterate on security measures to adapt to new challenges.
About Our TeamThe Frontier Systems team at OpenAI is at the forefront of technology, responsible for creating, deploying, and maintaining some of the world's largest supercomputers. These supercomputers are pivotal for training our most advanced AI models, pushing the boundaries of innovation.We transform sophisticated data center designs into operational systems and develop the software infrastructure necessary for extensive frontier model training. Our goal is to ensure these hyperscale supercomputers operate reliably and efficiently, supporting groundbreaking AI research.About the RoleAs a key member of the Frontier Systems team, you will be instrumental in designing the critical infrastructure that ensures our supercomputers function seamlessly for pioneering AI research. In this role, you'll address system-level challenges and implement automation solutions that minimize disruptions during large-scale training processes.Your responsibilities will encompass end-to-end ownership of your projects, allowing you to make significant contributions to our mission. This position is ideal for individuals who excel in diagnosing complex system issues and crafting automation strategies to proactively resolve problems across a vast network of machines.Your Responsibilities Include:Enhancing system health checks to maintain the stability of our hyperscale supercomputers during model training.Conducting in-depth investigations into hardware failures and system-level bugs to uncover root causes.Developing automation tools that monitor and resolve issues across thousands of systems, enabling uninterrupted research progress.You May Be a Great Fit If You Possess:7+ years of hands-on experience in software engineering.Strong proficiency in Python and shell scripting.Expertise in analyzing complex data sets using SQL, PromQL, Pandas, or other relevant tools.Experience in creating reproducible analyses.A solid balance of skills in both building and operationalizing systems.Prior experience with hardware is not a prerequisite for this role.Preferred Qualifications:Familiarity with the intricacies of hardware components, protocols, and Linux tools (e.g., PCIe, Infiniband, networking, power management, kernel performance tuning).Experience with system optimization and performance tuning.
Join aiedu as a Senior Lead in Research & Evaluation, where you will drive impactful research initiatives that shape educational practices and policies. In this role, you will lead a team of researchers in designing and executing comprehensive evaluations that inform our strategic direction. Your expertise will be critical in analyzing data, generating insights, and communicating findings to stakeholders.
Full-time|Remote|Remote-Friendly (Travel-Required) | San Francisco, CA | New York City, NY
Anthropic is looking for a Research Engineer focused on model evaluations. This position involves research and development to assess and strengthen the performance of AI models. Teams are based in San Francisco and New York City, and the role supports remote work with required travel. Key responsibilities Design and implement evaluations for Anthropic's AI models Collaborate with team members to enhance model performance Contribute to research that pushes the boundaries of AI systems Location Remote-friendly (travel required) San Francisco, CA New York City, NY
Join OpenAI as a Research Scientist and explore cutting-edge machine learning innovations. In this role, you will be at the forefront of developing groundbreaking techniques while advancing our team's research initiatives. Collaborate with talented peers across various teams to discover transformative ideas that scale effectively. We seek individuals who are passionate about pushing the boundaries of AI and want to contribute to our unified research vision.
About Our TeamThe Frontier Evaluations team is dedicated to developing pioneering model assessments that propel advancements toward safe Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI). This innovative group crafts ambitious evaluations to quantitatively assess and guide our models while establishing self-improvement cycles that influence our training, safety, and deployment strategies. Among our open-source evaluations are SWE-bench Verified, MLE-bench, PaperBench, and SWE-Lancer. The team has also executed frontier evaluations for significant models such as GPT4o, o1, o3, GPT 4.5, ChatGPT Agent, and GPT5. If you are passionate about being at the forefront of AI advancements and guiding their ethical development, this is the ideal team for you.About YouWe are on the lookout for exceptional research engineers who are eager to challenge the boundaries of frontier models in the finance sector. We seek individuals who will contribute to shaping AI evaluations focused on financial reasoning and associated competencies while managing distinct threads of this initiative from conception to execution.In This Role, You Will:Identify vital model capabilities, skills, and behaviors essential to financial operations, and develop methods to accurately measure performance in these areas.Take ownership of a research agenda aimed at uncovering significant model capabilities, particularly related to financial reasoning, and design evaluations to quantify them.Continuously enhance evaluations of frontier AI models to gauge the extent of cutting-edge capabilities.We Expect You To:Demonstrate a strong background in research engineering, particularly in AI and finance.Exhibit a collaborative spirit, working effectively within a cross-functional team environment.Showcase exceptional analytical and problem-solving skills.
Merge Labs is an innovative research facility dedicated to merging biological sciences and artificial intelligence to enhance human capability, autonomy, and experience. Our mission is to pioneer revolutionary methodologies in brain-computer interfaces that facilitate high-bandwidth interactions with the brain, seamlessly integrate advanced AI, and maintain safety and accessibility for all users.About the TeamAt Merge, we are addressing some of the most ambitious challenges in molecular engineering, synthetic biology, and neuroscience. Our Research Platform Team is responsible for creating the experimental frameworks necessary to tackle these challenges with exceptional speed and precision. The tools and methodologies developed by our team significantly enhance molecular assembly, protein expression, mammalian cell culture, advanced microscopy, sequencing, and unique custom techniques. We collaborate with program teams to establish and optimize these capabilities, implement automation where beneficial, and integrate with our data science and machine learning pipelines, continuously pushing the boundaries of throughput and innovation.About the RoleAs a Platform Scientist, you will be instrumental in developing high-efficiency and high-throughput experimental pipelines that accelerate research initiatives. You will work closely with program leads, project scientists, data scientists, and engineers, leading your work and potentially recruiting additional team members as necessary.Key Responsibilities:Collaborate with program leads and scientists to identify critical experimental requirements and workflows.Develop processes to facilitate high-throughput and/or high-efficiency experiments, including reagent production and analysis.Scope, procure, construct, program, and validate instruments to support experimental workflows.Ensure the quality, reliability, and integrity of data generated from automated pipelines, including defining and implementing suitable quality control checkpoints.Work alongside data science and machine learning engineers to incorporate metadata tracking, computational design, and analysis into experimental pipelines.Partner with electrical, mechanical, and software engineers to create custom setups.Innovate and validate concepts to enhance experimental throughput.
OverviewBecome an integral part of our dynamic R&D team dedicated to developing fully automated research systems that push the boundaries of AI. Zochi has achieved a milestone by publishing the first entirely AI-generated A* conference paper. Locus has set a new industry standard as the first AI system to surpass human experts in AI R&D.Key ResponsibilitiesConceptualize and develop innovative architectures for automated research.Work collaboratively within a specialized team of researchers addressing cutting-edge challenges in long-horizon agentic capabilities, post-training for open-ended objectives, and environment crafting.Document and publish key internal findings alongside success stories from external collaborations.QualificationsPhD or equivalent research experience in Computer Science, Machine Learning, Artificial Intelligence, or a related discipline. Outstanding candidates with significant research contributions are encouraged to apply, regardless of formal qualifications.Demonstrated history of impactful AI/ML research contributions in academic or corporate environments.Expertise in developing long-horizon, multi-agent systems and/or model post-training, especially in scientific domains or for open-ended discovery objectives.A strong passion for advancing problem-solving processes and scientific discovery, thriving in high-autonomy roles and environments.Our CultureCompetitive compensation and equity options.Unlimited Paid Time Off (PTO), emphasizing team collaboration and a community-focused workplace.Opportunities for conference participation and engagement in community initiatives.Empowered roles with high levels of responsibility.#1: We are a small, passionate team of leading investors, researchers, and industry experts committed to the mission of accelerating discovery. Join us.
About the TeamJoin the innovative Post-Training team at OpenAI, where we focus on refining and elevating pre-trained models for deployment in ChatGPT, our API, and future products. Collaborating closely with various research and product teams, we conduct crucial research that prepares our models for real-world deployment to millions of users, ensuring they are safe, efficient, and reliable.About the RoleAs a Research Engineer / Scientist, you will spearhead the research and development of enhancements to our models. Our work intersects reinforcement learning and product development, aiming to create cutting-edge solutions.We seek passionate individuals with robust machine learning engineering skills and research experience, particularly with innovative and powerful models. The ideal candidate will be driven by a commitment to product-oriented research.This position is located in San Francisco, CA, and follows a hybrid work model requiring three days in the office each week. Relocation assistance is available for new employees.In this role, you will:Lead and execute a research agenda aimed at enhancing model capabilities and performance.Work collaboratively with research and product teams to empower customers to optimize their models.Develop robust evaluation frameworks to monitor and assess modeling advancements.Design, implement, test, and debug code across our research stack.You may excel in this role if you:Possess a deep understanding of machine learning and its applications.Have experience with relevant models and methodologies for evaluating model improvements.Are adept at navigating large ML codebases for debugging purposes.Thrive in a fast-paced and technically intricate environment.About OpenAIOpenAI is a pioneering AI research and deployment organization dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We are committed to pushing the boundaries of AI capabilities while prioritizing safety and human-centric values in our products. Our mission is to embrace diverse perspectives, voices, and experiences that represent the full spectrum of humanity, as we strive for a future where AI is a powerful ally for everyone.
About Our TeamJoin the forefront of AI innovation with the RL and Reasoning team at OpenAI. Our team is dedicated to advancing reinforcement learning research and has pioneered transformative projects, including o1 and o3. We are committed to pushing the limits of generative models while ensuring their scalable deployment.About the RoleAs a Research Engineer/Research Scientist at OpenAI, you will play a pivotal role in enhancing AI alignment and capabilities through state-of-the-art reinforcement learning techniques. Your contributions will be essential in training intelligent, aligned, and versatile agents that power various AI models.We seek individuals with a solid foundation in reinforcement learning research, agile coding skills, and a passion for rapid iteration.This position is located in San Francisco, CA, and follows a hybrid work model of three days in the office per week. We also provide relocation assistance for new hires.You may excel in this role if:You are enthusiastic about being at the cutting edge of RL and language model research.You take initiative, owning ideas and driving them to fruition.You value principled methodologies, conducting simple experiments in controlled environments to draw trustworthy conclusions.You thrive in a fast-paced, complex technical environment where rapid iteration is essential.You are adept at navigating extensive ML codebases to troubleshoot and enhance them.You possess a profound understanding of machine learning and its applications.About OpenAIOpenAI is a pioneering AI research and deployment organization committed to ensuring that general-purpose artificial intelligence serves the greater good for humanity. We strive to push the boundaries of AI system capabilities while prioritizing safe deployment through our innovative products. We recognize AI as a powerful tool that must be developed with safety and human-centric principles, embracing diverse perspectives to reflect the full spectrum of humanity.We are proud to be an equal opportunity employer, welcoming applicants from all backgrounds without discrimination based on race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or any other legally protected characteristic.
Join Our Team as a Research ScientistAt Parallel, we are at the forefront of web infrastructure innovation, enabling businesses across sectors such as sales, marketing, insurance, and technology to harness the power of AI. Our state-of-the-art products empower users to develop superior AI agents with seamless and flexible access to the web.With significant backing of $130 million from prominent investors like Kleiner Perkins, Index Ventures, and Spark Capital, we are dedicated to redefining the web for artificial intelligence. As we expand, we're assembling a top-tier team of engineers, designers, marketers, sales experts, researchers, and operational specialists committed to our vision.Your Role: As a Research Scientist, you will tackle the challenge of training and scaling models designed to enhance web indexing capabilities.About You: You possess a profound understanding of contemporary models and training methodologies. You enjoy engaging in discussions about the convergence of search, recommendations, and transformer models, and are passionate about translating your research into impactful products and systems utilized by millions.
Zyphra is a pioneering artificial intelligence firm located in the vibrant city of San Francisco, California.About the Role:We are seeking a passionate Research Scientist to join our dynamic Agency and Reasoning Team at Zyphra. In this role, you will conduct cutting-edge research in reinforcement learning, post-training methodologies, and human preference learning. Your innovative ideas will be instrumental in shaping our next-generation language models, enabling their application on a large scale.What We Desire:A strong sense of research intuition and tasteCapability to navigate a research project from initial concept to execution and documentationProficiency in implementation and prototypingA quick thinker who can rapidly transform ideas into experimental frameworksAbility to collaborate effectively in a fast-paced research environmentAn insatiable curiosity and enthusiasm for the study of intelligence.Qualifications:Proven experience and skill in reinforcement learning, particularly in the context of language model reasoning or classical RL tasksFamiliarity with language-model-supervised fine-tuning and preference-learning techniques, such as DPO and simPO.Experience with methods for context-length extensionStrong intuitive understanding of model behaviors, with the ability to refine them through iterative fine-tuningInterest in engaging deeply with data and dedicating time to data engineering and synthetic data generationA postgraduate degree in a scientific discipline (Computer Science, Electrical Engineering, Mathematics, Physics)Published research in reputable machine learning venuesExpertise in PyTorch and PythonEagerness and aptitude for rapidly acquiring new knowledge and implementing innovative conceptsExceptional communication and teamwork abilities, capable of contributing to both research and large-scale engineering effortsWhy Join Zyphra?We champion creative and unconventional ideas and are prepared to invest significantly in innovative concepts.Our culture fosters collaboration, curiosity, and intellectual growth.
Zyphra is a cutting-edge artificial intelligence firm headquartered in the vibrant city of San Francisco, California.Position Overview:As a Research Scientist specializing in Model Architectures, you will play a pivotal role in Zyphra’s AI Architecture Research Team. Your responsibilities will include the design and thorough evaluation of innovative model architectures and training methodologies aimed at enhancing essential modeling capabilities (e.g., loss per flop or loss per parameter) and tackling core limitations inherent in current models. You will collaborate closely with our pre-training team to ensure that your findings are seamlessly integrated into our next-generation models.Qualifications:A strong research acumen and intuition.Proven ability to navigate research projects from initial conception to execution and final write-up.Exceptional implementation and prototyping skills, with the capability to swiftly transform ideas into experimental outcomes.A collaborative spirit and the ability to thrive in a fast-paced research environment.A deep curiosity and enthusiasm for understanding intelligence.Requirements:Experience with long-term memory, RAG/retrieval systems, dynamic/adaptive computation, and alternative credit assignment strategies.Knowledge of reinforcement learning, control theory, and signal processing techniques.A passion for exploring and critically evaluating unconventional ideas, with the ability to maintain a unique perspective.Familiarity with modern training pipelines and the hardware necessities for designing efficient architectures compatible with GPU hardware.Strong understanding of experimental methodologies for conducting rigorous ablations and hypothesis testing.High proficiency in PyTorch and Python programming.Ability to quickly assimilate into large pre-existing codebases and contribute effectively.Prior publication of machine learning research in reputable venues.Postgraduate degree in a scientific discipline (e.g., Computer Science, Electrical Engineering, Mathematics, Physics).Why Join Zyphra?We emphasize a structured research methodology that systematically addresses ambitious challenges in AI.
About Owner.comOwner.com is revolutionizing the growth trajectory of local restaurants through advanced AI technology.Our AI solutions are meticulously designed to enhance SEO, marketing strategies, and online ordering processes, ultimately boosting first-party orders. Unlike other companies that burden small business owners with complex software, Owner.com provides a streamlined, expert-driven system that guarantees results.Think of us as your dedicated team of engineers and marketers, empowering independent restaurants to compete with larger chains.Our VisionInitially focused on helping independent restaurants thrive online, our mission extends to all local businesses facing similar challenges.In an era where massive tech corporations threaten local business survival, we are committed to leveling the playing field.Once we perfect our solutions for restaurants, our goal is to expand our services across various local business sectors.We envision a future where millions of local business owners leverage our technology to excel in the digital landscape.Discover our Series C memo here →Our TractionSince our inception in 2020, we have generated tens of millions in revenue and processed over half a billion dollars in online orders. One in five Americans has interacted with an Owner.com website.More significantly, we have supported over 20,000 restaurant owners, saving them close to $200 million in fees.Our TeamOur dynamic team, composed of top talent from renowned companies in the SMB software space, is poised for rapid expansion as we continue to grow alongside our customers.
About Our TeamJoin the Foundations Research team, where we tackle ambitious and innovative projects that could redefine the future of AI. Our mission is to enhance the science behind our training and scaling initiatives, focusing on pioneering frontier models. We are dedicated to advancing data utilization, scaling methodologies, optimization strategies, model architectures, and efficiency enhancements to accelerate our scientific breakthroughs.About the PositionWe are on the lookout for a dynamic technical research lead to spearhead our embeddings-focused retrieval initiatives. You will oversee a talented team of research scientists and engineers committed to developing foundational technologies that enable models to access and utilize the right information precisely when needed. This includes crafting innovative embedding training objectives, architecting scalable vector storage, and implementing adaptive indexing techniques.This pivotal role will contribute to various OpenAI products and internal research initiatives, offering opportunities for scientific publication and significant technical influence.This position is located in San Francisco, CA, where we embrace a hybrid work model, requiring three days in the office weekly, and we provide relocation assistance for new hires.Your ResponsibilitiesLead cutting-edge research on embedding models and retrieval systems optimized for grounding, relevance, and adaptive reasoning.Supervise a team of researchers and engineers in building an end-to-end infrastructure for training, evaluating, and integrating embeddings into advanced models.Drive advancements in dense, sparse, and hybrid representation techniques, metric learning, and retrieval systems.Work collaboratively with Pretraining, Inference, and other Research teams to seamlessly integrate retrieval throughout the model lifecycle.Contribute to OpenAI's ambitious vision of developing AI systems with robust memory and knowledge access capabilities rooted in learned representations.You Will Excel in This Role If You PossessA proven track record of leading high-performance teams of researchers or engineers within ML infrastructure or foundational research.In-depth technical knowledge in representation learning, embedding models, or vector retrieval systems.Familiarity with transformer-based large language models and their interaction with embedding spaces and objectives.Research experience in areas such as contrastive learning and retrieval-augmented generation.
Join Whatnot as a Data Scientist specializing in Risk & Fraud and leverage your analytical skills to help us combat fraud and mitigate risks in our vibrant online marketplace. You will work collaboratively with cross-functional teams to analyze data, identify trends, and develop strategies that improve our platform's security and user experience.
Apr 1, 2026
Sign in to browse more jobs
Create account — see all 892 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.