companyFoxglove logo

Machine Learning Platform Engineer

FoxgloveSan Francisco, CA
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

Ideal candidates should possess a deep understanding of machine learning infrastructure, cloud platforms, and data management techniques. Experience with multimodal data processing and operationalizing ML systems in a production environment is highly desirable. Strong problem-solving skills and a proactive approach to tackling complex challenges are essential.

About the job

Join us in creating the backbone of data infrastructure for real-world robotic operations.

As robotics transitions from research labs to real-world applications across factories, warehouses, vehicles, and field deployments, understanding the intricacies of robotic performance becomes critical. When robots encounter failures or unexpected behaviors, data analysis is key to deciphering the underlying issues.

At Foxglove, we are at the forefront of building tools for observability, visualization, and data infrastructure that empower robotics and autonomous systems teams to manage, analyze, and derive insights from vast amounts of multimodal sensor data collected from operational systems and production fleets.

Role Overview

We are seeking a passionate ML Platform Engineer with robust infrastructure expertise to design, deploy, and scale our data platform systems. This platform-centric role will allow you to take charge of the infrastructure layer that facilitates machine learning in production environments, going beyond just the models themselves.

Your responsibilities will encompass ensuring the reliability, scalability, and performance of the ML platform, including areas such as inference serving, pipeline orchestration, training infrastructure, and evaluation frameworks. You will be tackling substantial challenges such as managing petabyte-scale multimodal robotics data and optimizing high-throughput retrieval and embedding pipelines in a hands-on infrastructure capacity.

Key Responsibilities

  • Design and operationalize production inference infrastructure, focusing on model serving, autoscaling, load balancing, and cost efficiency across cloud environments.

  • Own the platform architecture for embedding and retrieval pipelines that enable semantic search across multimodal robotics data (image, video, point cloud, and time series).

  • Develop and sustain the training and evaluation infrastructure that supports rapid model performance iteration, including job orchestration, experiment tracking, and dataset versioning.

  • Lead decisions on cloud infrastructure (AWS/GCP) that affect latency, throughput, reliability, and scalability.

  • Establish platform abstractions and internal tools that empower product engineers to deliver ML-enhanced features without managing infrastructure directly.

  • Assess, integrate, and operationalize third-party ML infrastructure components while establishing clear build vs. buy frameworks for the team.

About Foxglove

Foxglove is pioneering the development of essential tools that enhance the performance of robotics and autonomous systems. Our commitment to innovation drives us to create advanced solutions that allow teams to harness the power of data in real-time, enhancing the capabilities of robots in diverse operational settings.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.