Site Reliability Engineer (SRE) at Mithril | San Francisco

MithrilPalo Alto / San Francisco Bay Area

On-site Full-time $170K/yr - $230K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Experience Level

Mid to Senior

Qualifications

The ideal candidate will possess a Bachelor's degree in Computer Science or a related field, along with experience in cloud infrastructure, automation, and monitoring tools. Familiarity with SLOs, SLIs, and incident response processes is crucial. Strong programming skills in languages such as Python or Go, as well as experience with container orchestration tools like Kubernetes, will be advantageous.

About the job

The engineering team at Mithril is small, with each member making a significant impact. This Site Reliability Engineer (SRE) position is a foundational role focused on shaping how the platform scales across a multi-cloud environment.

Role overview

This SRE will play a central role in keeping Mithril's global GPU orchestration platform stable and high-performing. The responsibilities extend beyond day-to-day maintenance. The primary focus is on designing and building automation, observability, and tooling to help manage advanced compute resources across multiple cloud providers. The goal is to ensure customers have fast and dependable access to infrastructure.

Collaboration with Mithril's founding team is central to this job. The SRE will help set service level objectives (SLOs), orchestrate capacity, and make influential infrastructure decisions, gaining visibility into both technical and commercial aspects of the business.

What makes this SRE role unique

This position differs from many early-stage SRE roles that focus mainly on on-call rotations and incident response. Here, the emphasis is on building infrastructure that actively shapes Mithril's marketplace. The systems developed will determine how supply is sourced, allocated, and monitored across providers, directly affecting customer experience and company revenue.

The role offers genuine ownership, a fast feedback loop with leadership, and the opportunity to define how infrastructure engineering evolves as Mithril grows.

Core responsibilities

About 70–75% of the work centers on platform reliability and infrastructure automation.

Reliability & SLOs

Implement and manage service level indicators (SLIs) and service level objectives (SLOs) for Mithril's API layer and internal orchestration services to maintain high reliability and performance.

About Mithril

Mithril is a cutting-edge AI infrastructure company focused on democratizing GPU computing for enterprises and research communities alike. With a strong backing from prominent venture capital firms and a team of experienced professionals, Mithril is setting new standards in the AI landscape.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

1 - 20 of 6,136 Jobs

Search for Staff Lead Site Reliability Engineer Sre

6,136 results

Select all on this page (20)

Apply

Staff/Lead Site Reliability Engineer (SRE)

HeartFlow, Inc.

Full-time|$200.8K/yr - $250.9K/yr|On-site|San Francisco, California

About HeartFlow HeartFlow, Inc. is a medical technology company focused on improving the diagnosis and management of coronary artery disease. Our flagship product, the AI-powered HeartFlow FFRCT Analysis, provides a non-invasive, color-coded 3D view of a patient’s coronary arteries. Clinicians use our platform to identify blockages, assess blood flow, and an…

Apr 14, 2026

Apply

Site Reliability Engineer (SRE)

Baseten

Full-time|On-site|San Francisco Office

ABOUT BASETENBaseten is at the forefront of powering mission-critical AI inference for some of the most innovative companies globally, including Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. We integrate cutting-edge applied AI research with a flexible infrastructure and intuitive developer tools to empower companies at the leading edge of AI to deploy sophisticated models effectively. With our recent $300M Series E funding round—supported by prominent investors such as BOND, IVP, Spark Capital, Greylock, and Conviction—we are rapidly expanding. Join our dynamic team and contribute to creating an essential platform for engineers to launch AI products with ease.THE ROLEAs a Site Reliability Engineer, you will design and implement resilient systems and processes that ensure our infrastructure is scalable, reliable, and efficient. Your responsibilities will encompass everything from automating deployments and monitoring systems to enhancing performance and managing incidents effectively.Collaboration is key; you will work closely with our users to understand their challenges in operationalizing machine learning, facilitating their onboarding onto our platform, and leveraging these insights to inform improvements to Baseten.EXAMPLE INITIATIVESAs part of our Infrastructure team, you will engage in exciting projects such as:Innovative multi-cloud capacity managementOptimizing inference on B200 GPUsImplementing multi-node inferenceUtilizing fractional H100 GPUs for efficient model servingRESPONSIBILITIESDesign and maintain scalable infrastructures to support the deployment and operational needs of machine learning models.Establish standards and best practices to enhance reliability and performance across the infrastructure.Proactively identify and resolve reliability issues using monitoring and alerting systems.Collaborate with cross-functional teams to apply best practices in infrastructure management and incident response.Create automation scripts to streamline processes and reduce manual intervention.

Oct 9, 2025

Apply

Staff Site Reliability Engineer (SRE) - Agile

Okta, Inc.

Full-time|$162K/yr - $249K/yr|On-site|San Francisco, California

Okta is seeking a Staff Site Reliability Engineer to join the Infrastructure Platform AGILE SRE team in San Francisco. This position centers on supporting and improving the systems that underpin Okta’s identity infrastructure. Role overview The Staff SRE will work closely with multiple teams to develop and maintain critical infrastructure. A core part of this role involves enhancing internal tools and operational processes, ensuring that Okta’s systems remain secure and reliable as the company grows. What you will do Provide cross-functional support to teams building and maintaining key infrastructure components. Collaborate with Infrastructure Operations groups to address complex technical challenges. Diagnose, troubleshoot, and resolve sophisticated infrastructure issues by developing new tools and strategic solutions. Who we’re looking for Experienced SREs who are comfortable working on large-scale, impactful projects. Engineers who enjoy collaborating across teams and disciplines. Problem-solvers who can tackle intricate technical challenges and deliver reliable solutions. This role offers the chance to contribute directly to Okta’s mission of building secure, trusted infrastructure for organizations navigating the evolving landscape of AI and identity.

Apr 27, 2026

Apply

Software Engineer, Site Reliability (SRE)

Sierra

Full-time|On-site|San Francisco, CA

About UsAt Sierra, we are pioneering a transformative platform that empowers businesses to forge authentic customer experiences through AI technology. Headquartered in the vibrant city of San Francisco, we also boast a dynamic presence in Atlanta, New York, London, France, Singapore, and Japan.Our operations are anchored in core values that shape our culture: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These principles guide our actions and are integral to our mission.Our visionary founders, Bret Taylor and Clay Bavor, bring unparalleled expertise. Bret, currently the Board Chair of OpenAI, previously co-led Salesforce and served as CTO at Facebook, while Clay led numerous initiatives at Google, including AR/VR projects and Google Workspace.Your RoleIn your capacity as a Software Engineer on the Site Reliability team, you will play a crucial role in establishing and enhancing the reliability, observability, and scalability of Sierra’s AI-centric infrastructure. Collaborating closely with our engineering and product teams, your goal is to ensure our systems remain highly available, efficient, and primed for growth.Lead the development of Sierra’s observability stack—including monitoring, alerting, logging, and tracing—to provide engineers with critical insights into system health and performance.Collaborate with product and platform engineers to architect systems that prioritize reliability and scalability from the outset, not as an afterthought.Design and implement robust, scalable, and secure cloud infrastructure on AWS, employing Terraform and cutting-edge DevOps tools.Enhance the reliability and scalability of our LLM deployments, ensuring they operate efficiently and cost-effectively.Drive improvements in deployment pipelines, CI/CD tooling, and incident management processes to minimize downtime and accelerate response times.Define and cultivate SRE practices within Sierra, shaping culture, tooling, and best practices across the engineering organization.QualificationsBachelor's degree in Computer Science or a related field, or equivalent experience.Proven experience in Site Reliability Engineering or a similar role, with a strong understanding of cloud infrastructure (AWS).Proficiency in Terraform and modern DevOps practices.Experience with observability tools and techniques—monitoring, alerting, logging, and tracing.Strong problem-solving skills with a focus on scalability and performance optimization.Excellent collaboration and communication skills, with the ability to work effectively in a team environment.

Oct 21, 2025

Apply

Staff Site Reliability Engineer & DeFi Scalability Lead

ABC Labs

Full-time|On-site|North / Central / South America (in-person San Francisco Preference)

About ABC Labs:Reserve is an innovative cryptocurrency project pioneering the asset-backed currency revolution. ABC Labs developed Reserve to empower individuals to launch, mint, and redeem on-chain crypto indexes known as Decentralized Token Folios (DTFs) using robust, safety-first smart contracts. Experience expansive crypto exposure, earn effortless DeFi yield, or help create the next world reserve currency. Currently, only 0.03% of crypto is in indexes, a number we anticipate will grow rapidly as the DTF space expands, and Reserve is at the forefront of this movement.As we continue to tokenize real-world assets, we envision a protocol that facilitates new asset-backed currencies that are largely or fully independent of fiat currencies. Learn more about our vision and the Reserve team here.ABC Labs plays a vital role in the development of the Reserve protocol, contributing to the growth and sustainability of the Reserve ecosystem.Role Summary:We are seeking a highly skilled engineer to join our expanding protocol engineering team. You will work with the Ethereum mainnet and its Layer 2 solutions, custom APIs, data pipelines, Docker, Cloudflare, metrics software, and other tools to build and maintain a scalable backend infrastructure. Your focus will be on ensuring our frontend UI, backend APIs, and developer operations maintain exceptional reliability and scalability. Users deserve an intuitive, seamless DeFi experience without compromising security or decentralization. The ideal candidate possesses full-stack experience, specializes as an SRE, and has a passion for scaling operations. Leadership capabilities to guide a small team toward achieving our goals will be essential. As a startup, team members often wear multiple hats, but being an outstanding SRE is your primary responsibility.Our Tech Stack:Bare metal serversLinux (Ubuntu)DockerRedisPostgreSQLCloudflareTypeScriptRustResponsibilities:Provision, configure, and secure Linux servers, preferably through automationManage blockchain nodes to ensure maximum uptimeDeploy and configure monitoring tools such as Prometheus & GrafanaConduct load testing to identify and resolve bottlenecks in our APIOversee fleets of Docker containers using Dokku, Swarm, or Kubernetes

Jun 23, 2025

Apply

Senior/Staff Site Reliability Engineer

fal

Full-time|On-site|San Francisco

Join our dynamic team at fal as a Senior/Staff Site Reliability Engineer. In this key role, you will leverage your expertise to enhance our systems' reliability and performance. If you are passionate about building scalable systems and enjoy working in a collaborative environment, we want to hear from you!

Feb 23, 2026

Apply

Site Reliability Engineer (SRE) at Mithril | San Francisco

Mithril

Full-time|$170K/yr - $230K/yr|On-site|Palo Alto / San Francisco Bay Area

Mithril develops AI infrastructure aimed at making GPU computing more accessible and affordable for enterprises, AI startups, and researchers. Clients include LG AI Research, Saronic, and the Broad Institute. The company was founded by a former Google DeepMind research scientist and a Stanford CS PhD. Mithril has secured $80M in seed and Series A funding from Sequoia Capital and Lightspeed Venture Partners. Over the past year, platform revenue has grown more than sixfold. Fast Company recognized Mithril as the 8th Most Innovative Company in Artificial Intelligence for 2026. The engineering team at Mithril is small, with each member making a significant impact. This Site Reliability Engineer (SRE) position is a foundational role focused on shaping how the platform scales across a multi-cloud environment. Role overview This SRE will play a central role in keeping Mithril's global GPU orchestration platform stable and high-performing. The responsibilities extend beyond day-to-day maintenance. The primary focus is on designing and building automation, observability, and tooling to help manage advanced compute resources across multiple cloud providers. The goal is to ensure customers have fast and dependable access to infrastructure. Collaboration with Mithril's founding team is central to this job. The SRE will help set service level objectives (SLOs), orchestrate capacity, and make influential infrastructure decisions, gaining visibility into both technical and commercial aspects of the business. What makes this SRE role unique This position differs from many early-stage SRE roles that focus mainly on on-call rotations and incident response. Here, the emphasis is on building infrastructure that actively shapes Mithril's marketplace. The systems developed will determine how supply is sourced, allocated, and monitored across providers, directly affecting customer experience and company revenue. The role offers genuine ownership, a fast feedback loop with leadership, and the opportunity to define how infrastructure engineering evolves as Mithril grows. Core responsibilities About 70–75% of the work centers on platform reliability and infrastructure automation. Reliability & SLOs Implement and manage service level indicators (SLIs) and service level objectives (SLOs) for Mithril's API layer and internal orchestration services to maintain high reliability and performance.

Apr 22, 2026

Apply

Senior Staff Site Reliability Engineer - Observability

Okta, Inc.

Full-time|$194K/yr - $267K/yr|On-site|San Francisco, California

Discover OktaOkta is recognized as The World’s Identity Company, empowering individuals to securely leverage any technology across various devices and applications. Our versatile Okta Platform and Auth0 Platform provide reliable access, authentication, and automation, placing identity at the forefront of business security and expansion.At Okta, we value diverse perspectives and experiences. We seek continuous learners and individuals who can enhance our team with their distinct backgrounds.Join us as we create a world where identity is truly yours.We are in search of a highly skilled Observability Site Reliability Engineer specializing in Google Cloud, to take charge of and elevate our Observability ecosystem within GCP. In this position, you will progress beyond basic monitoring to develop a world-class, comprehensive, and scalable Observability Platform that supports our SRE teams and business collaborators. You will implement infrastructure as code by employing Terraform and demonstrating strong coding skills in Go, Python, or Ruby to automate the deployment of agents and collectors across intricate distributed systems.Key ResponsibilitiesAutomated Infrastructure: Design, build, and maintain scalable observability infrastructure utilizing tools such as Terraform.GCP Observability Engineering: Enhance the collection, processing, and storage of Observability data to guarantee high reliability and low latency for our Splunk and Grafana services.Incident Response: Engage in on-call rotations and conduct post-incident reviews to foster systemic improvements and promote 'observability-driven development.'Automation: Minimize 'toil' by automating the deployment and scaling of observability agents and collectors.

Mar 11, 2026

Apply

Staff Software Engineer, Site Reliability Engineer (SRE)

Harvey

Full-Time|On-site|San Francisco

Why Join Harvey?At Harvey, we're not just changing the landscape of legal and professional services; we're revolutionizing it from the ground up. By integrating cutting-edge AI technology with an enterprise-level platform and profound domain knowledge, we're setting new standards for how knowledge work is conducted for generations to come.This is a unique opportunity to be a part of a transformative journey at a pivotal moment for our company. With over 1,000 clients in more than 58 countries, a robust product-market fit, and exceptional investor backing, we are rapidly scaling and defining a new industry standard. The challenges are ambitious, the expectations are high, and the potential for personal, professional, and financial growth is unparalleled.Our team is composed of sharp, driven individuals who are deeply aligned with our mission. We operate with agility and intensity, taking ownership of the challenges we face—from initial brainstorming to long-term solutions. We engage closely with our clients, from executive leaders to engineers, collaborating to swiftly address real-world challenges with urgency and diligence. If you excel in uncertain environments, strive for excellence, and want to shape the future of work alongside high achievers, we encourage you to join our mission.At Harvey, we are writing the future of professional services today—and we’re just getting started.Role OverviewAs a Staff Software Engineer on our Site Reliability Engineering (SRE) team, you will play a crucial role in ensuring the reliability, scalability, and performance of our legal AI platform. You'll be part of a dynamic team that bridges infrastructure and product, taking ownership of systems that guarantee our platform is fast, secure, and consistently operational. Your efforts will be pivotal in scaling our operations across over 50 regions and in automating essential operational tasks. If you are enthusiastic about creating resilient systems and simplifying processes through automation, we would love to have you on board.This position is based in San Francisco, CA, and we follow an in-person work model, offering relocation assistance to new hires.

Dec 1, 2025

Apply

Staff Site Reliability Engineer - Fabric

MongoDB, Inc.

Full-time|$127K/yr - $249K/yr|Hybrid|United States

The TeamJoin our dynamic Platform Engineering team within Site Reliability Engineering (SRE), which is tasked with maintaining vital infrastructure and operational functions that empower our engineering organization. We manage multi-cloud Kubernetes infrastructures, deployment systems, and observability frameworks.The Fabric team specializes in ensuring secure communication between systems and the public internet. We focus on network architecture, service mesh, and edge load balancing, safeguarding customer data during transit. Our work is essential in building and sustaining a reliable, globally-connected multi-cloud network for MongoDB products.This position is available in our New York City headquarters, smaller offices in Austin, Palo Alto, and San Francisco, or as a fully remote role from anywhere in North America. Our hybrid work model accommodates both in-office and remote work.

Apr 8, 2026

Apply

Staff Site Reliability Engineer, Tech Lead

Unify

Full-time|On-site|San Francisco Office

About UnifyAt Unify, we are pioneering the first AI-driven system of action for revenue teams, enabling businesses to transform their outbound strategies into high-performing growth engines. Our focus is on making go-to-market execution measurable, repeatable, and scalable. Founded in 2023 by industry veterans from Ramp and Scale AI, our talented team has diverse experience from leading organizations such as Airbnb, Meta, Waymo, and Perplexity.In 2024, Unify achieved an impressive 8x revenue growth and serves notable clients including Perplexity, Cursor, SoFi, and Justworks. We are a dynamic, high-energy team backed by $58M in funding from Thrive, Emergence, OpenAI, and others. Join us as we shape the future of GTM!About the RoleAs the Staff SRE Tech Lead at Unify, you will be instrumental in enhancing the reliability and scalability of our platform as we handle increasing volumes of data and accommodate customers with stringent uptime requirements. You will define the technical roadmap for reliability engineering, lead a dedicated team of SREs, and collaborate closely with engineering leaders to establish systems and practices that ensure Unify remains both swift and dependable at scale.

Feb 5, 2026

Apply

Senior Staff Site Reliability Engineer - Tech Lead

Unify

Full-time|On-site|San Francisco Office

Join Unify as a Senior Staff Site Reliability Engineer and take the lead in transforming our technology landscape. In this pivotal role, you will spearhead initiatives to enhance our system reliability and performance, ensuring seamless operations across our platforms. Your expertise will guide a dynamic team, driving innovation and implementing best practices in site reliability engineering.

Mar 24, 2026

Apply

Senior Site Reliability Engineer

alembic

Full-time|On-site|San Francisco HQ

About the RoleJoin alembic as a Senior Site Reliability Engineer (SRE) and become an integral part of our mission to enhance platform reliability, observability, and operational excellence. In this pivotal role, you will collaborate with engineers and data scientists to architect, automate, and maintain the robust infrastructure that drives our platform, including data pipelines, machine learning workloads, and real-time analytics systems.This hands-on position offers significant visibility across the technology stack and provides you with the opportunity to shape the future of our infrastructure and operations.

Dec 22, 2025

Apply

Site Reliability Engineer at Mercor | San Francisco

Mercor

Full-time|On-site|San Francisco

Join the Mercor TeamAt Mercor, we stand at the dynamic intersection of labor markets and AI research. Collaborating with premier AI labs and enterprises, we empower the human intelligence that is crucial for AI's evolution.Our expansive talent network plays a vital role in training cutting-edge AI models, akin to the way educators impart knowledge to their students—by sharing insights, experiences, and contextual understanding that code alone cannot convey. Currently, our network of over 30,000 experts generates more than $2 million daily.We are pioneering a novel category of work where expertise fuels AI progress. Achieving this vision necessitates an ambitious, fast-paced, and deeply dedicated team. You will collaborate with researchers, operators, and AI firms that are at the forefront of transforming societal structures.Mercor is a thriving Series C company with a valuation of $10 billion. We operate five days a week in-person at our new headquarters in San Francisco.About the RoleAs a Site Reliability Engineer (SRE) at Mercor, you will take ownership of production reliability for our critical systems, working closely with our infrastructure leadership. You will play a pivotal role in establishing our SRE function and defining how Mercor manages large-scale, high-availability systems.Your ResponsibilitiesEnsure the reliability and safety of production for key shared services and customer-facing systems.Collaborate directly with infrastructure leadership to outline SRE priorities, reliability benchmarks, and the production safety roadmap.Enhance the structure of our production systems to ensure stability, resource efficiency, isolation, and observability.Advocate for and implement modern SRE methodologies (e.g., incident management, postmortems, SLIs/SLOs) across engineering teams.Work alongside engineering and applied AI teams to facilitate sustainable growth.Promote SRE best practices internally, supporting teams in a safe, scalable, and consistent production onboarding process.Who We SeekThe ideal candidate will have:Extensive experience in genuine SRE roles (not merely operations) across various positions or organizations.A deep understanding of SRE methodologies popularized by Google (e.g., error budgets, reliability vs. risk trade-offs, large-scale distributed systems).5+ years of SRE experience; ideally, 15+ years in total experience for this inaugural SRE position.A proven track record of managing systems at scale, with a strong grasp of the complexities involved.

Dec 27, 2025

Apply

Site Reliability Engineer at Superhuman | San Francisco

Superhuman, Inc.

Full-time|$214K/yr - $260K/yr|Hybrid|Hub - San Francisco

At Superhuman, we embrace a vibrant hybrid work model that offers our team members the ideal blend of focused individual work and collaborative in-person interactions, fostering trust, innovation, and a robust team culture.About SuperhumanSuperhuman, the AI productivity platform, is on a transformative mission to unlock the superhuman potential within everyone. With the integration of Grammarly's writing assistance and innovative tools like Coda’s collaborative workspaces and Go, our proactive AI assistant, we empower over 40 million individuals and 50,000 organizations globally. Founded in 2009, we strive to eliminate busywork and enhance productivity. Discover more at superhuman.com and explore our values here.The OpportunityTo meet our ambitious goals, we are seeking a Site Reliability Engineer (SRE) to join our infrastructure team. This pivotal role focuses on developing software solutions to maintain the reliability of our back-end systems while collaborating with engineering teams to strategize our future growth. You will also engage with our production engineering teams in Europe as we transition from a “you build it, you own it” approach.At Superhuman, our engineers and researchers enjoy the autonomy to innovate and drive breakthroughs, directly impacting our product roadmap. As we rapidly scale our interfaces, algorithms, and infrastructure, the complexity of our technical challenges is growing. Learn more about our technical endeavors on our technical blog.As an SRE, your responsibilities will include:Scaling our Kubernetes-based control plane that processes billions of events each day.Enhancing our automation mechanisms to efficiently respond to workload demands.Deploying machine learning systems across various departments.

Jun 18, 2025

Apply

Senior Software Engineer, Site Reliability Engineer (SRE)

Harvey

Full-Time|On-site|San Francisco

Why Join Harvey?At Harvey, we are revolutionizing the landscape of legal and professional services with a holistic approach. By integrating advanced AI technology, a robust enterprise platform, and extensive industry knowledge, we are redefining how essential knowledge work is conducted for years to come.This is a unique opportunity to contribute to the foundation of a transformative company at a pivotal moment in its journey. With over 1000 clients across more than 58 countries, a solid product-market fit, and outstanding investor backing, we are rapidly expanding and creating a new category in real-time. The challenges are significant, expectations are high, and the potential for personal, professional, and financial development is unparalleled.Our team comprises driven, intelligent individuals who are deeply passionate about our mission. We prioritize speed, intensity, and accountability in addressing challenges — from initial ideation to long-term solutions. By maintaining close relationships with our clients, from executives to engineers, we collaboratively address pressing issues with urgency and care. If you excel in uncertain environments, strive for excellence, and wish to shape the future of work alongside a team that raises the bar, we invite you to build alongside us.At Harvey, we are currently writing the future of professional services — and we are just getting started.Your RoleAs a Senior Software Engineer on the Site Reliability team at Harvey, your mission will be to uphold the reliability, scalability, and performance of our innovative legal AI platform. You will become part of a high-impact team that operates at the crossroads of infrastructure and product, taking ownership of the systems that ensure our platform remains fast, secure, and continuously available. From scaling operations across 50+ regions to automating critical processes, your efforts will fortify Harvey's resilience as we expand. If you are enthusiastic about constructing robust systems and simplifying complexity through automation, we would love to collaborate with you.This position is situated in San Francisco, CA, and we adhere to an in-person work model, providing relocation assistance to new employees.Your ResponsibilitiesDesign, implement, and oversee monitoring, alerting, and infrastructure resources (compute, storage, networking) across 50+ global regions.Lead incident management processes, including postmortems, root cause analyses, and driving actionable enhancements.Automate operational tasks and workflows by developing tools and processes for capacity planning, seamless rollouts, and secure data access to maintain high reliability and minimize manual intervention.Collaborate across teams to drive solutions that enhance system performance and reliability.

Dec 1, 2025

Apply

Senior Site Reliability Engineer at Carta | San Francisco, CA

Carta

Full-time|On-site|San Francisco, California; Santa Clara, California; Seattle, WA

Join Carta as a Senior Site Reliability Engineer, where you will play a pivotal role in enhancing our infrastructure and ensuring the reliability of our platforms. You will work collaboratively with cross-functional teams to implement innovative solutions that drive operational excellence and scalability.

Apr 3, 2026

Apply

Senior Site Reliability Engineer at prosper | San Francisco

prosper

Full-time|On-site|San Francisco, CA

Role overview The Senior Site Reliability Engineer at prosper plays a key role in maintaining and improving the reliability and performance of the company’s core systems. Collaboration with teams across the organization is essential to ensure services remain stable and efficient. What you will do Design and set up monitoring tools to track the health and performance of systems Automate routine operational tasks to minimize manual intervention and boost efficiency Diagnose and resolve complex technical problems that impact infrastructure or services Support projects aimed at strengthening infrastructure stability and preparing for future growth Location This role is located in San Francisco, CA.

Apr 27, 2026

Apply

Staff Site Reliability Engineer

Fieldguide

Full-time|Remote|San Francisco, CA or Remote (USA)

Join Fieldguide as a Staff Site Reliability Engineer, where you will play a pivotal role in enhancing our operational infrastructure and ensuring the reliability of our services. In this position, you will collaborate with cross-functional teams to design, implement, and maintain scalable systems. Your expertise in automation, monitoring, and incident response will be vital to our mission of delivering exceptional user experiences.

Apr 28, 2026

Apply

Senior Site Reliability Engineer - Future Opportunities

Twitter Inc.

Full-time|On-site|San Francisco

Join our innovative technology team at Twitter Inc. as a Senior Site Reliability Engineer. In this role, you will be pivotal in enhancing system reliability and performance, ensuring our services run smoothly and efficiently. We are seeking passionate engineers who thrive in a fast-paced environment and are eager to tackle challenging problems.

Jan 3, 2023

Create account — see all 6,136 results

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, or location & role pages.