Periodic Labs logo

Distributed Training Engineer

Periodic LabsMenlo Park, Remote
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

Ideal Candidates Will Have Experience With:Training on clusters with at least 5,000 GPUs5D parallel LLM trainingDistributed training frameworks such as Megatron-LM, FSDP, DeepSpeed, and TorchTitanOptimizing training throughput for large-scale Mixture-of-Expert models

About the job

About Periodic Labs

Periodic Labs is an innovative AI and physical sciences laboratory dedicated to constructing cutting-edge models aimed at facilitating groundbreaking scientific discoveries. With substantial funding and rapid growth, our team members are empowered as owners who proactively identify and solve challenges without the constraints of bureaucracy. We are passionate about embracing new tools and scientific insights to advance our mission.

About the Role

As a Distributed Training Engineer, you will be at the forefront of optimizing, operating, and developing large-scale distributed LLM training systems that drive AI scientific research. Collaborating closely with researchers, you will support mid-training and reinforcement learning workflows, troubleshoot issues, and maintain seamless operations. You will also build tools and directly contribute to pioneering experiments, ensuring that Periodic Labs remains the premier AI and science lab for physicists, computational materials scientists, AI researchers, and engineers. Additionally, you will play a role in advancing open-source large-scale LLM training frameworks.

About Periodic Labs

Periodic Labs is a dynamic and rapidly expanding AI and physical sciences lab focused on developing advanced models to promote innovative scientific exploration. Our collaborative culture emphasizes ownership, problem-solving, and continuous learning.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.