Mithril logoMithril logo

Site Reliability Engineer (SRE) at Mithril | San Francisco

MithrilPalo Alto / San Francisco Bay Area
On-site Full-time $170K/yr - $230K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Mid to Senior

Qualifications

The ideal candidate will possess a Bachelor's degree in Computer Science or a related field, along with experience in cloud infrastructure, automation, and monitoring tools. Familiarity with SLOs, SLIs, and incident response processes is crucial. Strong programming skills in languages such as Python or Go, as well as experience with container orchestration tools like Kubernetes, will be advantageous.

About the job

Mithril develops AI infrastructure aimed at making GPU computing more accessible and affordable for enterprises, AI startups, and researchers. Clients include LG AI Research, Saronic, and the Broad Institute. The company was founded by a former Google DeepMind research scientist and a Stanford CS PhD. Mithril has secured $80M in seed and Series A funding from Sequoia Capital and Lightspeed Venture Partners. Over the past year, platform revenue has grown more than sixfold. Fast Company recognized Mithril as the 8th Most Innovative Company in Artificial Intelligence for 2026.

The engineering team at Mithril is small, with each member making a significant impact. This Site Reliability Engineer (SRE) position is a foundational role focused on shaping how the platform scales across a multi-cloud environment.

Role overview

This SRE will play a central role in keeping Mithril's global GPU orchestration platform stable and high-performing. The responsibilities extend beyond day-to-day maintenance. The primary focus is on designing and building automation, observability, and tooling to help manage advanced compute resources across multiple cloud providers. The goal is to ensure customers have fast and dependable access to infrastructure.

Collaboration with Mithril's founding team is central to this job. The SRE will help set service level objectives (SLOs), orchestrate capacity, and make influential infrastructure decisions, gaining visibility into both technical and commercial aspects of the business.

What makes this SRE role unique

This position differs from many early-stage SRE roles that focus mainly on on-call rotations and incident response. Here, the emphasis is on building infrastructure that actively shapes Mithril's marketplace. The systems developed will determine how supply is sourced, allocated, and monitored across providers, directly affecting customer experience and company revenue.

The role offers genuine ownership, a fast feedback loop with leadership, and the opportunity to define how infrastructure engineering evolves as Mithril grows.

Core responsibilities

About 70–75% of the work centers on platform reliability and infrastructure automation.

Reliability & SLOs

  • Implement and manage service level indicators (SLIs) and service level objectives (SLOs) for Mithril's API layer and internal orchestration services to maintain high reliability and performance.

About Mithril

Mithril is a cutting-edge AI infrastructure company focused on democratizing GPU computing for enterprises and research communities alike. With a strong backing from prominent venture capital firms and a team of experienced professionals, Mithril is setting new standards in the AI landscape.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.