About the job
At Rhoda AI, we are pioneering the future of humanoid robotics by establishing a comprehensive stack that includes advanced, software-defined hardware along with foundational models and video world models to drive our innovations. Our robots are engineered to be versatile, capable of navigating complex real-world scenarios that extend beyond traditional training environments. Our interdisciplinary research team, featuring experts from prestigious institutions such as Stanford, Berkeley, and Harvard, is at the forefront of large-scale learning, robotics, and systems engineering. With over $400 million raised, we are making significant investments in research and development, hardware innovation, and scaling our manufacturing capabilities to bring our vision to life.
We are seeking a motivated Machine Learning Inference Engineer to join our team and contribute to the development and operation of the inference systems that power our automation stack. You will play a crucial role in ensuring the efficient and reliable execution of large foundation models, collaborating closely with our robotic platforms and internal task tools.
Key Responsibilities:
Develop and maintain infrastructure for model inference across both cloud and on-premises environments.
Optimize the latency, throughput, and reliability of deployed machine learning models.
Design and scale services for serving diverse foundation models in both research and production contexts.
Collaborate with research and robotics teams to enhance inference optimization and integration.
Create tools for model deployment, version control, and observability to facilitate rapid iteration cycles.
Contribute to the robustness and scalability of the inference stack as model complexity and deployment demands evolve.
Qualifications:
Minimum of 3 years of experience in machine learning infrastructure, MLOps, or backend systems.
Proven experience in deploying and managing machine learning inference workloads in production environments.
Excellent knowledge of Kubernetes and containerized deployment pipelines.
Familiarity with cloud service providers such as AWS and GCP, including GPU orchestration capabilities.
Experience with popular ML frameworks including PyTorch and TensorFlow, as well as model serving tools like Triton, TorchServe, and Ray Serve.
Strong debugging capabilities and a proactive ownership mindset, comfortable resolving issues across the technology stack.

