About the job
Waymo is a pioneering company in autonomous driving technology, dedicated to becoming the world's most trusted driver. Originating from the Google Self-Driving Car Project in 2009, Waymo has committed to developing the Waymo Driver—The World’s Most Experienced Driver™—to enhance mobility access and save countless lives lost in traffic accidents. Our technology powers a fully autonomous ride-hail service and can be adapted to various vehicle platforms and applications. With over ten million rider-only trips and extensive experience driving more than 100 million miles on public roads and tens of billions of simulation miles across 15+ U. S. states, we are at the forefront of this transformative journey.
The Simulation ML Infrastructure team is focused on creating scalable AI/ML infrastructure that accelerates the Simulator team in developing state-of-the-art realistic simulations for testing and training the Waymo Driver. To enhance the realism and steerability of these simulations, we leverage large foundation models trained on vast datasets to accurately represent the real world, including realistic agents (vehicles, pedestrians, cyclists, motorcyclists), road systems, traffic control measures, and environmental conditions.
We are looking for a seasoned senior individual contributor to spearhead the advancement of sophisticated AI/ML infrastructure for multi-billion parameter foundation models within ML accelerator-friendly simulations. Your expertise in massive model scaling, ML accelerators, and large-scale distributed systems will be essential in designing and scaling our systems.
This position reports to an Engineering Manager.
Your Responsibilities:
- Join a top-tier, high-performing research engineering team to push the boundaries of ultra-realistic multi-agent simulations using foundation models.
- Collaborate closely with the Waymo Realism Modeling team located in London and the Waymo Oxford team to utilize large foundation models for enhancing simulation realism.
- Operate at the intersection of data engineering, model development, and simulations, making architectural decisions. Take ownership of large, complex systems, and ensure that architectures and designs align with both technical and business objectives.
- Design and scale extensive distributed systems that encompass the entire ML lifecycle, facilitating planet-scale dataset generation, model training, and evaluation.
- Work cross-functionally to derive performance and system-level requirements for large ML systems. Convert product and business goals into measurable technical deliverables, ensuring alignment of system components.

