About the job
Intrinsic is a groundbreaking venture from Alphabet, dedicated to transforming the industrial robotics landscape. Our passionate team believes that advancements in AI, perception, and simulation will unlock unprecedented possibilities for industrial robotics in the near future, with software and data as the driving force.
We aim to make industrial robotics intelligent, accessible, and practical for millions of businesses, entrepreneurs, and developers. As a dynamic collective of engineers, roboticists, designers, and technologists, we are committed to realizing the creative and economic potential of industrial robotics.
Role
In this role as a Senior Software Engineer specializing in MLOps and deep learning infrastructure, you will architect and develop the core systems that empower our robots with cutting-edge machine learning capabilities. Collaborating with a cross-disciplinary team of engineers and researchers, you will create infrastructure that enhances the training, evaluation, and deployment of large-scale AI models. Your expertise will provide accessible tools for integrating machine learning into the Intrinsic ecosystem, managing computing resources across cloud and on-premise environments, and ensuring efficient, reliable, and scalable model lifecycles for real-world industrial applications.
How Your Work Advances Our Mission
- Design and implement scalable infrastructure for training and deploying deep learning models integrated with real-time robotic control systems.
- Optimize data loading and training speeds across 1000+ GPU training jobs.
- Develop data pipelines that facilitate distributed computing to handle vast amounts of robotics data for model training.
- Create APIs and tools that empower internal and external researchers to seamlessly incorporate machine learning techniques into their workflows, potentially leading efforts to open source models and engage with the community.
- Enhance the allocation of computing resources, such as GPUs and TPUs, to minimize costs and latency during model development, while establishing orchestration workflows for successful job execution on Google Kubernetes Engine (GKE).
- Develop tools for model understanding and analysis to ensure reliability and traceability throughout the machine learning lifecycle.

