companyScale AI, Inc. logo

Tech Lead/Manager, Machine Learning Research Scientist - LLM Evaluations

Scale AI, Inc.San Francisco, CA; Seattle, WA; New York, NY
On-site Full-time $280K/yr - $380K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Manager

Qualifications

Key Responsibilities:Lead a high-performing team of research scientists and engineers focused on LLM evaluations. Conduct research on the effectiveness and constraints of current LLM evaluation techniques. Design and develop innovative evaluation benchmarks for large language models, addressing areas such as instruction adherence, factual accuracy, robustness, and fairness. Foster communication and collaboration with clients and peer teams to facilitate cross-functional initiatives. Work with internal teams and external partners to refine metrics and establish standardized evaluation protocols. Implement scalable and reproducible evaluation pipelines using modern machine learning frameworks. Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives. Stay current with ongoing research within the team, assist in overcoming technical challenges, and engage in design decision-making. Maintain strong involvement in the research community, both understanding trends and influencing them. Excel in a dynamic, fast-paced startup environment and commit to driving impactful results. Desired Qualifications:5+ years of practical experience in large language models, natural language processing, and Transformer modeling, in both research and engineering contexts. A proven track record of achieving significant research impacts in a fast-paced setting. Experience in supporting and leading a team of research scientists and engineers.

About the job

As a premier data and evaluation partner for cutting-edge AI firms, Scale AI is committed to enhancing the evaluation and benchmarking of large language models (LLMs). We are developing industry-leading LLM evaluations that set new benchmarks for model performance assessment. Our mission is to create rigorous, scalable, and equitable evaluation methodologies that propel the next evolution of AI capabilities.

Our Research teams collaborate with top AI laboratories to provide high-quality data and expedite advancements in Generative AI research. As the Tech Lead/Manager of the LLM Evaluations Research team, you will guide a skilled team of research scientists and engineers dedicated to crafting and applying innovative evaluation methodologies, metrics, and benchmarks that assess the strengths and weaknesses of our advanced LLMs. This pivotal role involves designing and executing a strategic roadmap that establishes best practices in data-driven AI development, thus accelerating the development of the next generation of generative AI models in collaboration with leading foundational model labs.

About Scale AI, Inc.

Scale AI is the leading evaluation partner for advanced AI companies, focused on enhancing the benchmarking and assessment of large language models through innovative methodologies and collaboration with top research labs.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.