About the job
About Flexion Robotics
Flexion Robotics is building the intelligence framework for tomorrow’s humanoid robots. The company’s mission is to move from early prototypes to fully functional humanoid systems. Founded by leading scientists in robot reinforcement learning (with backgrounds at Nvidia and ETH Zürich) and backed by top international venture capital, Flexion Robotics has quickly progressed from first lines of code to deploying real humanoid capabilities.
Role Overview
The Machine Learning Infrastructure Engineer will help shape the core computing and data systems that support cognitive development for humanoid robots. This position focuses on building and maintaining the platforms needed to train large foundational models on substantial datasets. The work involves designing training clusters, architecting data pipelines to move information from simulators and robots into model training, and creating tools that enable AI engineers to train, evaluate, and iterate efficiently.
This is a senior, on-site position based in Zürich. The Infrastructure team includes engineers with experience at Google, Meta, and Amazon. The role offers broad responsibility for systems supporting data collection, training, and experimentation workflows, including infrastructure strategy, cluster orchestration, distributed training, data platforms, CI, and experimentation tools.
What You Will Do
- Design, deploy, and maintain GPU compute clusters for large-scale model training across multiple cloud providers, including job scheduling with Slurm and Kubernetes.
- Build data platforms and pipelines: set up storage, processing, and serving layers to manage data from simulator outputs and robot telemetry to training datasets. This includes infrastructure using object storage (S3), parallel filesystems (Lustre), and data formats such as Parquet, WebDataset, and LeRobot. Use distributed processing tools like Ray and Spark to transform and validate data at scale.
- Work with AI engineers to optimize distributed training on multi-node GPU clusters, focusing on throughput, device utilization, and communication efficiency. Improve distributed IsaacLab-based sim-to-real training workflows.
- Evaluate and select new platforms: assess cloud providers, GPU-as-a-Service options, and new tools, taking ownership of decisions as computing needs grow.
Location
This role is on-site at Flexion Robotics’ Zürich office.

