Qualifications
Key Responsibilities:Lead a high-performing team of research scientists and engineers focused on LLM evaluations. Conduct research on the effectiveness and constraints of current LLM evaluation techniques. Design and develop innovative evaluation benchmarks for large language models, addressing areas such as instruction adherence, factual accuracy, robustness, and fairness. Foster communication and collaboration with clients and peer teams to facilitate cross-functional initiatives. Work with internal teams and external partners to refine metrics and establish standardized evaluation protocols. Implement scalable and reproducible evaluation pipelines using modern machine learning frameworks. Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives. Stay current with ongoing research within the team, assist in overcoming technical challenges, and engage in design decision-making. Maintain strong involvement in the research community, both understanding trends and influencing them. Excel in a dynamic, fast-paced startup environment and commit to driving impactful results. Desired Qualifications:5+ years of practical experience in large language models, natural language processing, and Transformer modeling, in both research and engineering contexts. A proven track record of achieving significant research impacts in a fast-paced setting. Experience in supporting and leading a team of research scientists and engineers.
About the job
As a premier data and evaluation partner for cutting-edge AI firms, Scale AI is committed to enhancing the evaluation and benchmarking of large language models (LLMs). We are developing industry-leading LLM evaluations that set new benchmarks for model performance assessment. Our mission is to create rigorous, scalable, and equitable evaluation methodologies that propel the next evolution of AI capabilities.
Our Research teams collaborate with top AI laboratories to provide high-quality data and expedite advancements in Generative AI research. As the Tech Lead/Manager of the LLM Evaluations Research team, you will guide a skilled team of research scientists and engineers dedicated to crafting and applying innovative evaluation methodologies, metrics, and benchmarks that assess the strengths and weaknesses of our advanced LLMs. This pivotal role involves designing and executing a strategic roadmap that establishes best practices in data-driven AI development, thus accelerating the development of the next generation of generative AI models in collaboration with leading foundational model labs.
About Scale AI, Inc.
Scale AI is the leading evaluation partner for advanced AI companies, focused on enhancing the benchmarking and assessment of large language models through innovative methodologies and collaboration with top research labs.