Tech Lead/Manager, Machine Learning Research Scientist - LLM Evaluations

Scale AI, Inc.San Francisco, CA; Seattle, WA; New York, NY

On-site Full-time $280K/yr - $380K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Manager

Qualifications

Key Responsibilities:Lead a high-performing team of research scientists and engineers focused on LLM evaluations. Conduct research on the effectiveness and constraints of current LLM evaluation techniques. Design and develop innovative evaluation benchmarks for large language models, addressing areas such as instruction adherence, factual accuracy, robustness, and fairness. Foster communication and collaboration with clients and peer teams to facilitate cross-functional initiatives. Work with internal teams and external partners to refine metrics and establish standardized evaluation protocols. Implement scalable and reproducible evaluation pipelines using modern machine learning frameworks. Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives. Stay current with ongoing research within the team, assist in overcoming technical challenges, and engage in design decision-making. Maintain strong involvement in the research community, both understanding trends and influencing them. Excel in a dynamic, fast-paced startup environment and commit to driving impactful results. Desired Qualifications:5+ years of practical experience in large language models, natural language processing, and Transformer modeling, in both research and engineering contexts. A proven track record of achieving significant research impacts in a fast-paced setting. Experience in supporting and leading a team of research scientists and engineers.

About the job

As a premier data and evaluation partner for cutting-edge AI firms, Scale AI is committed to enhancing the evaluation and benchmarking of large language models (LLMs). We are developing industry-leading LLM evaluations that set new benchmarks for model performance assessment. Our mission is to create rigorous, scalable, and equitable evaluation methodologies that propel the next evolution of AI capabilities.

Our Research teams collaborate with top AI laboratories to provide high-quality data and expedite advancements in Generative AI research. As the Tech Lead/Manager of the LLM Evaluations Research team, you will guide a skilled team of research scientists and engineers dedicated to crafting and applying innovative evaluation methodologies, metrics, and benchmarks that assess the strengths and weaknesses of our advanced LLMs. This pivotal role involves designing and executing a strategic roadmap that establishes best practices in data-driven AI development, thus accelerating the development of the next generation of generative AI models in collaboration with leading foundational model labs.

About Scale AI, Inc.

Scale AI is the leading evaluation partner for advanced AI companies, focused on enhancing the benchmarking and assessment of large language models through innovative methodologies and collaboration with top research labs.

Similar jobs

1 - 20 of 1,284,790 Jobs

Select all on this page (20)

Apply

Associate People Solutions

Delivery Hero

Contract|On-site|Maadi

Join Delivery Hero as an Associate in People Solutions and play a pivotal role in enhancing our workforce's experience. We are looking for enthusiastic individuals eager to contribute to our innovative HR solutions and help shape a positive work environment. This position is ideal for those who are passionate about people management and want to grow their careers within a dynamic and supportive team.

Apr 30, 2026

Apply

Senior Quality Assurance Engineer - Frontend Development Team

Acumatica

Full-time|On-site|Belgrade

Join our dynamic Frontend Development Team at Acumatica as a Senior Quality Assurance Engineer. In this pivotal role, you will ensure the highest standards of quality for our innovative software solutions. You will collaborate closely with developers and product managers to identify and rectify bugs, optimize performance, and enhance user experiences. Your expertise will play a crucial role in maintaining our commitment to excellence and customer satisfaction.

Apr 30, 2026

Apply

Senior Product Manager - Retail Selection

Coupang

Full-time|On-site|Taipei, Taiwan

Company Overview Coupang is revolutionizing the shopping experience, aiming to impress each customer from the moment they open our app to the instant their order arrives at their doorstep. Our services in Taiwan include "Rocket Delivery," which guarantees next-day delivery on an extensive range of products at competitive prices, and "Rocket Oversea," which provides free international shipping on millions of top-selling items from Korea, the U.S., and other regions. We are on the lookout for talented individuals to join us in driving Coupang's growth in Taiwan. This is a unique chance to be part of our journey and to create a world where customers ask themselves, "How did I ever live without Coupang?"

Apr 30, 2026

Apply

Strategic Commercial Planning Manager

NielsenIQ

Full-time|On-site|Cotia

As a Strategic Commercial Planning Manager at NielsenIQ, you will be at the forefront of driving strategic initiatives that shape the commercial landscape of our operations. You will collaborate with cross-functional teams to develop and implement robust commercial plans that align with our business objectives. Your analytical skills will be crucial in interpreting market trends and customer insights, enabling you to make informed decisions that enhance our competitive edge.

Apr 30, 2026

Apply

Training Associate, Driver Operations

Lalamove

Full-time|On-site|Dhaka

Join Lalamove, a pioneering force in the logistics industry, where we revolutionize the connection between customers and drivers through cutting-edge technology. Our platform provides a swift and seamless booking experience for delivery and moving services, whether users are at home, at work, or on the move. At Lalamove, we don't just talk about O2O - we embody it!As a prominent global on-demand delivery platform, we have millions of delivery partners fulfilling countless orders every day. With over 1,600 dedicated employees across Southeast Asia and Latin America, our company has achieved unicorn status since 2018, backed by leading venture capitalists and consistently expanding at a remarkable pace.Our core values drive our success: Passion for serving local communities, empowering SMEs and driver partners; Execution and Grit, which allow us to stand out by persevering and pursuing excellence; and Humility, fostering a culture of continuous learning and improvement.We are committed to community engagement. Every day, millions of drivers and customers leverage our technology to connect and facilitate the movement of essential goods. Our mission is to enhance urban living by enabling the rapid and convenient flow of goods. We strive to accomplish this vision with a ‘glocal’ approach, building a strong operations team that tailors our products to local business networks and delivery contractors, while also aiming to enhance our international brand presence.We are on the lookout for a motivated Training Associate, Driver Operations to lead our training initiatives and elevate service quality, contributing to our overall business growth.Are you ready to take on this challenge? Apply now!Only shortlisted candidates will be notified.

Apr 30, 2026

Apply

Manager/Senior Manager – Governance

frpadvisory

Full-time|On-site|London

We are seeking a proactive and experienced Manager/Senior Manager to lead our Governance team at frpadvisory. In this pivotal role, you will be responsible for overseeing governance frameworks, ensuring compliance with regulations, and driving strategic initiatives that promote organizational integrity and effectiveness.Your expertise will enable you to collaborate with cross-functional teams, provide guidance on governance best practices, and contribute to the continuous improvement of our governance processes. If you are passionate about governance and possess strong leadership skills, we want to hear from you!

Apr 30, 2026

Apply

Sample Job Title for Eurofins in Freiberg

Eurofins Scientific

Full-time|On-site|Freiberg

Join Eurofins, a global leader in bioanalytical testing, as a Sample Entry Employee (m/f/d) in our Freiberg location. In this dynamic role, you will be responsible for the intake and processing of various samples, ensuring that quality standards and protocols are met. You will work closely with our laboratory team to contribute to the efficiency and accuracy of our testing services.

Apr 30, 2026

Apply

Mitarbeiter (m/w/d) für Probenregistrierung bei Eurofins

Eurofins Scientific

Full-time|On-site|Freiberg

Werden Sie Teil eines führenden Unternehmens in der Labordiagnostik! Als Mitarbeiter (m/w/d) für die Probenregistrierung bei Eurofins spielen Sie eine entscheidende Rolle in unserem Team. Ihre Aufgaben umfassen die effiziente und präzise Registrierung von Proben, die Unterstützung bei der Datenverwaltung sowie die Koordination mit verschiedenen Abteilungen, um einen reibungslosen Arbeitsablauf sicherzustellen. Wir suchen eine motivierte Person, die gerne im Team arbeitet und ein Auge für Details hat.

Apr 30, 2026

Apply

AV Technical Engineer - Remote Opportunity

Kinly

Full-time|€3.2K/mo - €4.2K/mo|Remote|Remote job

Join our dynamic team at Kinly as an AV Technical Engineer, a role that offers the opportunity to work remotely while delivering top-notch audiovisual and IT solutions to our clients in the Randstad region. Your expertise will be instrumental in ensuring the successful installation and execution of a variety of AV/IT projects.In this pivotal role, you will:Oversee the installation and delivery of AV/IT projects, implementing necessary configuration and software changes.Be responsible for the comprehensive execution of AV/IT installations, including setup, commissioning, operation, assembly, and disassembly of systems of varying complexity.Configure AV/IT systems and implement modifications as required.Ensure project delivery aligns with agreed timelines, budgets, and quality standards.Resolve technical issues and challenges promptly.

Apr 30, 2026

Apply

SEND 1:1 Buddy Support - Paxton

Junior Adventures Group

Full-time|On-site|LONDON

Join our dedicated team as a SEND 1:1 Buddy Support professional in Paxton, where you will play a pivotal role in providing tailored support to children with Special Educational Needs and Disabilities (SEND). Your compassionate approach will help foster an inclusive and engaging environment, ensuring every child can thrive.In this role, you will work closely with children to assist with their daily activities, promote social interactions, and support their educational development. Your insights and dedication will contribute significantly to their growth and well-being.

Apr 30, 2026

Apply

Level 1 Helpdesk Support Technician

Dijital Team Pty Ltd

Full-time|On-site|Colombo

About the Role: Join our dynamic support team as a Level 1 Helpdesk Support Technician. This hands-on position is perfect for a customer-focused individual at the beginning of their technical career, eager to assist users, troubleshoot issues, and establish a solid foundation in telephony and cloud communications. Client Overview: Our client is a premier technology firm based in Australia, specializing in innovative telephony, unified communications, and contact center solutions. Their advanced platforms are designed to integrate seamlessly with top-tier vendors such as Microsoft Teams, Avaya, and Cisco, providing dependable and client-centric communication solutions.

Apr 30, 2026

Apply

Credit Control Clerk

Syscogb

Full-time|On-site|Ashford

Join Syscogb as a Credit Control Clerk, where you will play a crucial role in managing and overseeing our credit control processes. Your responsibilities will include monitoring customer accounts, ensuring timely payments, and assisting with collections. We are looking for a detail-oriented individual who can maintain our financial integrity and contribute to our company's success.

Apr 30, 2026

Apply

Junior Customer Support Analyst

NielsenIQ

Full-time|On-site|Pune

Join our team as a Junior Customer Support Analyst, where you will play a vital role in providing exceptional service and support to our clients. You will be responsible for troubleshooting issues, responding to customer inquiries, and ensuring a positive customer experience. If you are passionate about helping others and eager to grow your career in customer support, we want to hear from you!

Apr 30, 2026

Apply

Research Associate – Legal

Scalable GmbH

Full-time|On-site|München

Join Scalable GmbH as a Research Associate in the Legal department, where you will play a pivotal role in conducting in-depth legal research and analysis. Your contributions will support our team in providing high-quality legal services and solutions to our clients. This position is perfect for individuals with a passion for law and research, looking to kick-start their career in a dynamic environment.

Apr 30, 2026

Apply

Senior Advisory AI Foundry Architect

ServiceNow

Full-time|On-site|Bangalore

Join ServiceNow as a Senior Advisory AI Foundry Architect, where you will play a crucial role in shaping AI-driven solutions for our clients. Leverage your expertise in AI technologies to architect and implement innovative solutions that transform business processes and enhance customer experiences.

Apr 30, 2026

Apply

Compliance Expert (m/w/d) at scalablegmbh | Berlin

scalablegmbh

Full-time|On-site|Berlin

Join our dynamic team at scalablegmbh as a Compliance Expert (m/w/d) in Berlin. In this pivotal role, you will ensure our operations adhere to regulations and industry standards, contributing to our mission of delivering exceptional service and maintaining trust with our clients.

Apr 30, 2026

Apply

Cloud Architect

Avaloq

Full-time|On-site|Bioggio

Join Avaloq as a Cloud Architect and take your career to new heights! In this exciting role, you will leverage your expertise in cloud technologies to design and implement innovative cloud solutions that meet our clients' needs. You will collaborate with cross-functional teams to ensure seamless integration and deployment of cloud services.

Apr 30, 2026

Apply

Service Planner

Faac Technologies

Full-time|On-site|Utrecht, Utrecht, Nederland

Join Faac Technologies as a Service Planner and play a pivotal role in ensuring efficient project execution and resource allocation. You will collaborate with cross-functional teams to optimize workflows and enhance service delivery. Your expertise in planning and organization will help us achieve our strategic goals.

Apr 30, 2026

Apply

Compliance Specialist (m/w/d)

scalablegmbh

Full-time|On-site|München

Join our dynamic team at scalablegmbh as a Compliance Specialist. In this pivotal role, you will ensure that our operations adhere to industry regulations and standards, playing a crucial part in safeguarding our organization’s integrity. You will collaborate with various departments to develop and implement compliance strategies that promote ethical practices and mitigate risks.

Apr 30, 2026

Apply

Advisory AI Foundry Architect

ServiceNow

Full-time|On-site|Bangalore

Join the innovative team at ServiceNow as an Advisory AI Foundry Architect, where you will leverage your expertise in artificial intelligence to drive transformative projects and solutions. You will work closely with clients to understand their challenges and architect AI-driven solutions that enhance operational efficiency and effectiveness.

Apr 30, 2026

Create account — see all 1,284,790 results