Software Engineer In Compute Infrastructure jobs in San Francisco – Browse 5,789 openings on RoboApply Jobs

Software Engineer in Compute Infrastructure

OpenAISan FranciscoNew

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Qualifications:Proficiency in programming languages such as Python, C++, or Go. Experience with distributed systems and high-performance computing. Strong understanding of Kubernetes and orchestration tools. Ability to optimize systems for performance and reliability. Familiarity with hardware architecture and low-level systems programming. Excellent problem-solving skills and engineering judgment.

About the job

Team and Platform Focus

The Compute Infrastructure team at OpenAI designs, builds, and maintains the systems that support AI research at scale. This work brings together accelerators, CPUs, networking, storage, data centers, orchestration software, agent infrastructure, developer tools, and observability. The aim is to create a reliable, unified experience for researchers and product teams across the company.

Projects span the full stack: capacity planning, cluster lifecycle management, bare-metal automation, and distributed systems. The team manages Kubernetes scheduling, system optimization, high-performance networking, storage, fleet health, reliability, workload profiling, benchmarking, and improvements to the developer experience. Even small improvements in communication, scheduling, hardware efficiency, or debugging can significantly accelerate research. OpenAI matches engineers to areas within Compute Infrastructure that align with their skills and interests.

Role Overview

This Software Engineer role centers on building and evolving the compute platform that supports OpenAI’s research and products. Candidates may bring expertise in low-level systems, high-performance computing, distributed infrastructure, reliability, CaaS, agent infrastructure, developer platforms, tooling, or infrastructure user experience. The most important qualities are strong analytical skills, the ability to write resilient code, and a collaborative approach that helps colleagues move faster and with more confidence.

What You Will Work On

Working close to hardware or at the user interaction layer
Developing CaaS and agent infrastructure
Managing control and data planes that connect the system
Bringing new supercomputing capabilities online
Optimizing training workloads through profiler traces and benchmarks
Improving NCCL and collective communication
Analyzing GPUs, NICs, topology, firmware, thermal dynamics, and failure modes
Designing abstractions to unify diverse clusters into a single platform

Areas of Expertise

No one is expected to cover every area listed. Some engineers focus on system performance, kernel or runtime behavior, large-scale networking protocols, RDMA, NCCL, GPU hardware, benchmarking, scheduling, or hardware reliability. Others improve the platform’s usability through APIs, tools, workflows, and developer experience. The team values strong engineering judgment and a drive to advance the field.

About OpenAI

OpenAI is at the forefront of artificial intelligence research, dedicated to developing safe and beneficial AI technologies. Our innovative environment fosters collaboration and creativity, empowering engineers to work on groundbreaking projects that shape the future of AI.

Similar jobs

1 - 20 of 5,789 Jobs

Select all on this page (20)

Apply

Software Engineer in Compute Infrastructure

OpenAI

Full-time|On-site|San Francisco

Team and Platform Focus The Compute Infrastructure team at OpenAI designs, builds, and maintains the systems that support AI research at scale. This work brings together accelerators, CPUs, networking, storage, data centers, orchestration software, agent infrastructure, developer tools, and observability. The aim is to create a reliable, unified experience for researchers and product teams across the company. Projects span the full stack: capacity planning, cluster lifecycle management, bare-metal automation, and distributed systems. The team manages Kubernetes scheduling, system optimization, high-performance networking, storage, fleet health, reliability, workload profiling, benchmarking, and improvements to the developer experience. Even small improvements in communication, scheduling, hardware efficiency, or debugging can significantly accelerate research. OpenAI matches engineers to areas within Compute Infrastructure that align with their skills and interests. Role Overview This Software Engineer role centers on building and evolving the compute platform that supports OpenAI’s research and products. Candidates may bring expertise in low-level systems, high-performance computing, distributed infrastructure, reliability, CaaS, agent infrastructure, developer platforms, tooling, or infrastructure user experience. The most important qualities are strong analytical skills, the ability to write resilient code, and a collaborative approach that helps colleagues move faster and with more confidence. What You Will Work On Working close to hardware or at the user interaction layer Developing CaaS and agent infrastructure Managing control and data planes that connect the system Bringing new supercomputing capabilities online Optimizing training workloads through profiler traces and benchmarks Improving NCCL and collective communication Analyzing GPUs, NICs, topology, firmware, thermal dynamics, and failure modes Designing abstractions to unify diverse clusters into a single platform Areas of Expertise No one is expected to cover every area listed. Some engineers focus on system performance, kernel or runtime behavior, large-scale networking protocols, RDMA, NCCL, GPU hardware, benchmarking, scheduling, or hardware reliability. Others improve the platform’s usability through APIs, tools, workflows, and developer experience. The team values strong engineering judgment and a drive to advance the field.

Apr 27, 2026

Apply

Senior Software Engineer - Compute Infrastructure

Databricks

Full-time|On-site|San Francisco, California

Databricks is looking for a Senior Software Engineer focused on Compute Infrastructure in San Francisco, California. This position centers on building and improving compute architecture to support greater performance and scalability across Databricks' platform. What you will do Develop and optimize compute infrastructure to handle demanding data processing and analytics workloads. Work closely with teams from different disciplines to deliver reliable, high-quality solutions for customers. Impact Your contributions will help define how data processing and analytics evolve at Databricks. The work directly supports customers’ ability to scale and perform complex tasks in the cloud. Who we’re looking for Strong background in cloud technologies and compute systems. Enjoys tackling complex technical challenges. Collaborative approach to problem-solving with cross-functional teams.

Apr 28, 2026

Apply

Engineering Manager - Compute Infrastructure

Databricks

Full-time|$190K/yr - $253.8K/yr|On-site|Mountain View, California; San Francisco, California

P-931 At Databricks, we are dedicated to empowering data teams to tackle some of the most challenging problems in the world—from revolutionizing transportation to fast-tracking medical innovations. We achieve this by developing and managing the foremost data and AI infrastructure platform, enabling our clients to leverage profound data insights to enhance their enterprises. Founded by engineers with a customer-centric approach, we seize every chance to resolve technical challenges, from crafting next-generation UI/UX for data interactions to scaling our services and infrastructure across millions of virtual machines. And we’re just getting started. Within Databricks, the Compute Infrastructure organization is responsible for building and operating the essential framework that supports all Data, AI, and stateful workloads across major cloud platforms. Our system launches tens of millions of VMs daily, manages thousands of Kubernetes clusters, and must deliver exceptional elasticity, reliability, and cost-effectiveness. We are in search of an Engineering Manager to lead a team focused on pivotal components of this platform. Your contributions will significantly impact product delivery speed, customer satisfaction, and our company's scalability. The impact you will have: Own and enhance the compute platform to support all Databricks workloads, enabling engineers to create top-tier products with high velocity and superior performance. Recruit exceptional engineers and nurture their development through guidance, feedback, and career advancement opportunities. Elevate the technical and operational standards through robust design practices, rigorous testing, and a culture of engineering excellence and platform thinking. Collaborate with engineering and product leadership to establish long-term strategies and roadmaps. Lead cross-functional initiatives encompassing both product and infrastructure domains. Influence architectural decisions that extend beyond your immediate team.

Feb 13, 2026

Apply

Infrastructure Software Engineer

Sift

Full-time|$150K/yr - $200K/yr|On-site|San Francisco, CA

At Sift, we are revolutionizing the way cutting-edge machines are constructed, tested, and managed. Our innovative platform provides engineers with real-time visibility into high-frequency telemetry, effectively removing bottlenecks and facilitating quicker, more dependable development.Sift originated from our experience at SpaceX, contributing to projects like Dragon, Falcon, Starlink, and Starship, where the demands of scaling telemetry, debugging flight systems, and ensuring mission reliability necessitated a new kind of infrastructure. Founded by a talented team from SpaceX, Google, and Palantir, Sift is tailored for mission-critical systems where precision and scalability are imperative.As one of the pioneering engineers at Sift, your role will extend beyond just coding—you will play a crucial part in defining the architecture, shaping the product, and influencing the culture of a company dedicated to addressing real engineering challenges. If you're eager to take on intricate technical obstacles and build foundational systems that support complex machines from the ground up, we would love to connect with you.

Oct 30, 2025

Apply

Staff Software Engineer, Compute

fal

Full-time|$180K/yr - $250K/yr|On-site|San Francisco

Join our innovative team at fal as a Staff Software Engineer specializing in large-scale computation platforms. We are seeking a seasoned software engineer with extensive experience in developing backend systems that efficiently orchestrate workloads and manage resource constraints. Your expertise in foundational cloud infrastructure and Linux provisioning will be crucial as you work towards achieving high reliability and scalability with minimal operational overhead.

Dec 16, 2025

Apply

Product Infrastructure Software Engineer

Netic

Full-time|On-site|San Francisco

Netic is revolutionizing the essential services sector with our AI-driven revenue engine, empowering the backbone of the American economy.With $43M in funding from leading investors such as Founders Fund, Greylock, Hanabi, and Dylan Field, who spearheaded our Series B, we have enabled our clients to secure hundreds of thousands of jobs across various service industries in North America. Today, numerous companies thrive entirely on an AI-first model powered by Netic.As a member of our team consisting of innovative builders from top organizations such as Scale, Databricks, HRT, Meta, MIT, Stanford, and Harvard, you will be at the forefront of integrating frontier AI into the physical economy, where challenges are complex, data is intricate, and impacts are immediate and substantial.In the role of a founding Product Infrastructure Engineer, you will design and scale the crucial infrastructure that supports our autonomous AI agents, addressing real-world challenges with significant, tangible outcomes. You will work alongside a passionate team of builders to develop infrastructure and processes from scratch, utilizing state-of-the-art cloud and orchestration technologies. If you excel in dynamic, ambiguous settings and are eager to set new benchmarks in the agentic domain, this is your chance to make a lasting impact.

May 30, 2025

Apply

Software Engineer for AI Infrastructure

Eventual Computing

Full-time|On-site|San Francisco

About EventualAt Eventual, we are reimagining how AI applications process vast amounts of data, from images to complex datasets. Traditional data platforms are not equipped to handle the petabytes of multimodal data essential for AI, causing teams to struggle with inadequate infrastructure. Founded in 2022, our mission is to simplify data querying, making it as intuitive as working with tables while ensuring scalability for production workloads.Our open-source engine, Daft, is specifically designed for real-world AI systems. It efficiently manages external APIs, GPU clusters, and addresses failures that traditional engines cannot handle. Daft is already integral to operations at leading companies such as Amazon, Mobileye, Together AI, and CloudKitchens.We pride ourselves on our exceptional team, which includes talents from Databricks, AWS, Nvidia, Pinecone, GitHub Copilot, Tesla, and others. We have quadrupled our team size in just a year, supported by Series A and seed funding from notable investors like Felicis, CRV, Microsoft M12, and Y Combinator. We are now eager to expand further. Join us—Eventual is just getting started.We are seeking passionate individuals who are excited to collaborate in a close-knit team environment, working together four days a week in our San Francisco Mission district office.Your Role:As a Software Engineer, you will take charge of developing Eventual's core products and architecture. You’ll deliver features that our customers will use immediately and collaborate with a dedicated team that values open communication and cross-functional teamwork. Our fast-paced environment is focused on solving a variety of complex technical and product challenges. While our experienced team is here to provide guidance and mentorship, we appreciate engineers who can independently identify and tackle challenging technical issues.Key Responsibilities:Design and develop highly reliable and resilient products and features.Collaborate closely with cross-functional product and customer-facing teams to understand requirements and deliver thoughtful solutions.Write high-quality, extensible, and maintainable code.Create and build scalable applications and components.Architect and manage Kubernetes clusters optimized for our needs.

Sep 22, 2025

Apply

Software Engineer - Infrastructure (Mid to Senior Level)

Julius

Full-time|On-site|San Francisco, CA

Julius operates as an applied AI lab, developing advanced coding agents for a broad user base. The platform executes about 1 million lines of code every 36 hours, serves over 1 million users, and generates more than 3 million visualizations. All code runs in tightly managed, isolated sandboxes. Julius is a revenue-generating business backed by AI Grant, YCombinator, Bessemer Venture Partners, and founders from leading technology companies. Role overview This mid to senior level Software Engineer - Infrastructure role focuses on designing and scaling the code-execution sandboxes that form the backbone of Julius. The infrastructure spans cloud platforms such as AWS and GCP, orchestrating over 500,000 containers each month. The main priorities are reliability, performance, and security in a multi-tenant compute environment. What you will do Design and maintain secure, multi-tenant container infrastructure with rapid startup and intelligent autoscaling. Deploy and manage cloud resources using Helm and Terraform, including SSO, network controls, and audit logging. Enhance observability through metrics, traces, and logs. Define SLOs and lead incident response efforts. Optimize container images, scheduling, networking, and costs. Develop and enforce fair-use and rate-limiting policies. Requirements Hands-on experience with production Kubernetes and container internals (Docker or containerd), as well as strong networking skills. Familiarity with cloud services (AWS, GCP, or Azure) and Infrastructure as Code tools such as Terraform and Helm. Proficiency with monitoring and logging tools like Prometheus, Grafana, OpenTelemetry, ELK, or Vector. Understanding of security best practices for containerized, multi-tenant systems. Preferred qualifications Experience with technologies such as gVisor, Kata, Firecracker, Cilium, eBPF, GPU scheduling, or serverless autoscaling frameworks (KEDA, Knative, Karpenter). Interest in AI projects, especially those involving large language models (LLMs). Benefits and compensation Competitive base salary Substantial equity options Comprehensive health and dental coverage Gym reimbursement Daily team meals Commuter assistance Julius offers the chance to work in San Francisco, CA, alongside a small and highly skilled team tackling large-scale infrastructure challenges. The systems here operate at significant scale and complexity, providing opportunities to solve demanding technical problems in a collaborative setting.

Apr 23, 2026

Apply

Infrastructure Manager - Compute Markets

Andromeda Cluster

Full-time|Remote|Global Remote / San Francisco, CA

Location: North America Remote / San Francisco · Full-TimeAbout AndromedaFounded by Nat Friedman and Daniel Gross, Andromeda Cluster is on a mission to democratize access to advanced AI infrastructure for early-stage startups. Initially starting with a single managed cluster, we rapidly expanded our capabilities to build a robust orchestration layer that enhances global AI infrastructure accessibility.We collaborate with prominent AI labs, data centers, and cloud providers to ensure compute resources are efficiently delivered where and when they are most required. Our innovative platform optimizes the routing of training and inference jobs globally, enhancing flexibility and operational efficiency in one of the most dynamic markets around.Our vision is to establish the liquidity layer for global AI compute, and we are continually seeking exceptional talent in AI infrastructure, research, and engineering.The OpportunityWe are in search of an Infrastructure Manager to enhance the alignment of supply and demand on our platform. This role is an Individual Contributor position, reporting directly to the Head of Infrastructure. The Infrastructure team forms the backbone of our operations, focusing on acquiring and managing compute resources in collaboration with our compute providers, sales, and technical teams.As we scale our operations, we aim to broaden our network and liquidity while deepening our service offerings and accelerating growth.What You'll Do• Align incoming leads from the sales team with both internal and external compute capacities.• Optimize the utilization of our compute resources.• Identify and onboard new compute suppliers globally.• Source capacity tailored to customer requirements and market trends.• Address customer and supplier challenges in a fast-paced, dynamic environment.• Analyze technical and commercial differences among suppliers to refine our capacity strategies.• Formulate a proactive compute strategy driven by market insights.• Negotiate costs with suppliers and vendors.• Design and implement capacity planning processes.

Mar 25, 2026

Apply

Software Engineer, Infrastructure

Sierra

Full-time|On-site|San Francisco, CA

About UsAt Sierra, we are revolutionizing the way businesses engage with their customers by building a cutting-edge platform that harnesses the power of AI. Our headquarters is located in the vibrant city of San Francisco, with additional offices expanding in Atlanta, New York, London, France, Singapore, and Japan.Our company culture is deeply rooted in our core values: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These principles guide our actions and foster an environment where innovation thrives.Sierra was co-founded by visionary leaders Bret Taylor, who currently serves as the Board Chair of OpenAI and has a rich history with Salesforce and Facebook, and Clay Bavor, who previously led Google Labs and spearheaded initiatives like Google Lens and Project Starline.Your RoleAs a Software Engineer focusing on Infrastructure at Sierra, you will play a pivotal role in designing, constructing, and maintaining the foundational systems that empower our AI platform. Your expertise will ensure that our infrastructure is not only secure and reliable but also scalable, allowing product teams to execute their work with agility and confidence.Guarantee the reliability, scalability, and performance of our platform and LLM inference serving in response to increasing traffic demands.Develop and oversee cloud infrastructure using Terraform to create secure, scalable, and reproducible environments.Establish and manage a self-service infrastructure platform to empower engineering teams in deploying and operating services independently.Take ownership of and improve CI/CD pipelines and release management processes, facilitating rapid and reliable deployments across Sierra’s platform.Design and manage distributed systems utilizing distributed databases, retrieval systems, and machine learning models.Develop and sustain core data serving abstractions along with essential authentication and security features (SSO, RBAC, authentication controls).Effectively navigate and integrate our technology stack with enterprise customer environments in a scalable and maintainable manner.

Oct 15, 2025

Apply

Infrastructure Software Engineer

Exa

Full-time|On-site|San Francisco, California

At Exa, we are on a mission to create a cutting-edge search engine from the ground up, designed to cater to the diverse needs of AI applications. Our team is building a robust infrastructure that enables us to crawl the internet, train advanced embedding models for indexing, and develop high-performance vector databases using Rust. Additionally, we manage a significant $5M H200 GPU cluster that powers tens of thousands of machines.The Infrastructure Team at Exa is responsible for developing the essential tools and infrastructure that support our entire system. We are looking for talented infrastructure engineers to help us scale our capabilities rapidly. Your work could involve orchestrating GPU clusters with Kubernetes, implementing map-reduce batch jobs on Ray, or creating top-tier observability tools that set industry standards.

Sep 3, 2025

Apply

Staff Software Engineer, Stream Compute

Stripe, Inc.

Full-time|On-site|San Francisco, Seattle, New York, Toronto

Join Stripe as a Staff Software Engineer in our Stream Compute team, where you will play a pivotal role in building scalable solutions that power the financial infrastructure of the internet. As a member of our innovative engineering team, you will leverage your expertise to design and implement robust software solutions that enhance the performance and reliability of our streaming data capabilities.

Apr 1, 2026

Apply

Senior Software Engineer, Infrastructure

Serval

Full-time|On-site|San Francisco

Who We AreServal is an innovative AI-driven automation platform redefining operational efficiency for enterprises. Our intelligent agents seamlessly comprehend and execute real-world workflows, replacing outdated manual processes with adaptive, self-learning software. Since our inception in early 2024, we have garnered the trust of industry leaders such as General Motors, Notion, Perplexity, Vercel, Mercor, LangChain, and Verkada, streamlining high-volume operational tasks across their organizations.At the heart of Serval is a cutting-edge agentic AI platform that transforms natural language into actionable workflows. Our agents not only respond to queries but also reason, act across various systems, and continuously enhance their performance. What started as a solution for operational tasks has rapidly expanded into a versatile AI automation layer utilized across IT, HR, Finance, Security, Legal, and Engineering sectors.Our mission is to eradicate repetitive, manual tasks within enterprises, empowering teams through intelligent automation. In the long run, we aim to establish a universal AI operations layer—a system of agents that integrates across business functions, maintaining the momentum of modern companies.We are proud to be backed by renowned investors including Sequoia Capital, Redpoint Ventures, Meritech, First Round, General Catalyst, and Elad Gil, and founded by seasoned product and engineering leaders from Verkada.Role OverviewAs a Senior Software Engineer in Infrastructure at Serval, you will be pivotal in developing and scaling the core systems that empower our AI agents and workflow automation platform. A crucial aspect of this role involves enabling and supporting self-hosted deployments for enterprise clients needing on-premises or private cloud environments. We are looking for engineers with profound expertise in distributed systems, infrastructure-as-code, production operations, and customer-facing support, who aspire to influence the technical architecture of a rapidly evolving platform.What You'll DoDesign, implement, and operate large-scale distributed systems that power Serval's AI agents, workflow orchestration, and data pipelines.Create and maintain Terraform modules to provision and manage cloud infrastructure across AWS, GCP, or Azure environments.Develop and sustain deployment packages, installation scripts, and infrastructure templates, enabling customers to self-host Serval in their own environments.Provide technical support and guidance to enterprise customers during installation and deployment phases.

Jan 29, 2026

Apply

Software Engineer - Computational Photography

Rylo

Full-time|On-site|San Francisco, CA

At Rylo, we are revolutionizing the way you capture and share your experiences. Our state-of-the-art camera is designed to record your surroundings with breathtaking clarity and stability, eliminating the hassle of traditional video capture. Created by a team of visionary engineers from Instagram and Apple, our innovative stabilization software and user-friendly smartphone app ensure that every shot you take is a masterpiece. With Rylo, you can focus on enjoying the moment while we handle the technicalities of creating stunning videos.Experience Rylo in actionAs a Software Engineer specializing in Computational Photography, you will play a crucial role in enhancing the core algorithms that power the Rylo camera and future products. Your work will fundamentally enhance the photography and cinematography experience, focusing on improving image quality and developing groundbreaking computational photography features. You will engage in the complete lifecycle of algorithm development, from design and implementation to quality evaluation and performance optimization, culminating in successful deployment.Your collaboration with software engineers, hardware engineers, and designers will allow you to push the boundaries of consumer camera technology.

Mar 1, 2026

Apply

Software Engineer, Infrastructure

Imprint

Full-time|On-site|San Francisco

About UsAt Imprint, we are revolutionizing the world of co-branded credit cards and innovative financial solutions, focusing on smarter, more rewarding, and brand-first experiences. We collaborate with renowned brands such as Crate & Barrel, Rakuten, Booking.com, H-E-B, Fetch, and Brooks Brothers to establish modern credit programs that enhance customer loyalty, unlock savings, and stimulate growth. Our robust platform integrates advanced payment technologies, intelligent underwriting, and a seamless user experience, enabling brands to offer impactful financial products without the complexities of becoming a bank.Co-branded credit cards represent over $300 billion in U.S. annual spending, yet many are still managed by outdated banking systems. Imprint stands as the modern alternative—flexible, technology-driven, and tailored for today’s consumers. Supported by notable investors like Kleiner Perkins, Thrive Capital, and Khosla Ventures, we are assembling a world-class team dedicated to reshaping payment methods and driving brand growth. If you thrive in fast-paced environments, enjoy tackling complex challenges, and aspire to make a significant impact, we would be delighted to meet you.Discover more about us on Imprint's Technology Blog.The TeamThe Tech Platform Engineering Team at Imprint is pioneering the democratization of access to advanced technologies, empowering teams across our organization to innovate and excel. Our commitment to redefining the Fintech landscape drives us to build secure, highly available infrastructures while equipping our engineers with comprehensive development tools, allowing them to rapidly create world-class products.Your RoleDesign, build, and manage cloud and web infrastructure with a strong emphasis on security, reliability, and scalability.Implement and maintain infrastructure components across computing, networking, and data platforms.Adhere to security best practices in cloud infrastructure, ensuring proper access control, network isolation, and secure communication between services.Monitor system health and engage in incident response, root cause analysis, and reliability enhancements.Collaborate with platform, security, and product engineers to deliver safe and efficient infrastructure solutions.

Jan 16, 2026

Apply

Software Engineer - Infrastructure Team at Anyscale | San Francisco, CA

Anyscale

Full-time|On-site|San Francisco or Palo Alto, CA

About Anyscale:At Anyscale, we are on a mission to democratize distributed computing, making it accessible for software developers across all skill levels. We are actively commercializing Ray, a prominent open-source project that's fostering an ecosystem of libraries designed for scalable machine learning. Leading companies such as OpenAI, Uber, Spotify, Instacart, Cruise, among others, have integrated Ray into their tech stacks to expedite the deployment of AI applications in real-world scenarios.At Anyscale, we are committed to creating the optimal environment for running Ray, enabling developers and data scientists to effortlessly scale machine learning applications from their laptops to large clusters without requiring expertise in distributed systems.We are proud to be backed by Andreessen Horowitz, NEA, and Addition, with over $250 million raised to date.About the RoleAnyscale is seeking a talented Software Engineer to join our Infrastructure team. Our goal is to deliver next-generation tools and infrastructure that simplify the development and execution of distributed AI applications in the cloud, making it as straightforward as local development. As a member of the Infra team, you will contribute to the creation of a scalable, secure, and resilient backbone that supports this vision.

Feb 14, 2025

Apply

Backend & Infrastructure Software Engineer

vooma

Full-time|On-site|San Francisco Office

About the RoleJoin our pioneering team at vooma as a Backend & Infrastructure Software Engineer, where you will play a critical role in shaping the technical infrastructure of a transformative company.If you are passionate about creating not only resilient systems but also the foundational architecture of a groundbreaking enterprise from the outset, this position is ideal for you.We are looking for someone who excels at crafting infrastructure that is elegant, dependable, and secure, even under high-demand scenarios. You thrive on the challenge of scaling systems that enable intelligent agents and take pride in establishing reliable foundations that others can rely on.Your Key Responsibilities Include:Design and maintain secure, scalable infrastructure tailored for AI-powered agents in production environments.Deploy and optimize AI-driven services to meet high availability and performance standards.Manage infrastructure as code, alongside cloud environments and CI/CD pipelines.Implement monitoring, observability, and alerting systems to ensure the reliability of our infrastructure.Contribute to infrastructure security and adhere to best practices.You Should Have:Experience in deploying and productionizing machine learning or AI-centric workloads.Proficiency in developing secure, scalable infrastructures on platforms such as AWS, Azure, or GCP.In-depth knowledge of backend systems, networking, and container orchestration technologies (e.g., Kubernetes).Understanding of infrastructure security principles and compliance standards (e.g., SOC2).A proactive and hands-on mindset, with a strong drive to solve challenges from start to finish.

Jul 1, 2025

Apply

Software Engineer - Infrastructure

Baseten

Full-time|$300K/yr - $300K/yr|On-site|San Francisco

ABOUT BASETENJoin Baseten, where we drive mission-critical AI inference for leading companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our unique blend of applied AI research, robust infrastructure, and intuitive developer tools empowers organizations at the forefront of AI innovation to deploy state-of-the-art models into production. Recently, we secured a $300M Series E funding round, backed by esteemed investors such as BOND, IVP, Spark Capital, Greylock, and Conviction. Be a part of our rapid growth and help shape the platform that engineers trust for launching AI products.THE ROLEAs an Infrastructure Software Engineer at Baseten, you will play a pivotal role in developing and maintaining our ML inference platform that powers AI applications in production. Your contributions will enhance the core infrastructure, enabling developers to deploy, scale, and monitor machine learning models with exceptional performance.EXAMPLE INITIATIVESYou will engage in innovative projects within our Infrastructure team, including:Multi-cloud capacity managementInference on B200 GPUsMulti-node inferenceFractional H100 GPUs for efficient model servingRESPONSIBILITIESDesign and develop infrastructure components for our ML inference platform, primarily using Python and Go.Implement and maintain Kubernetes deployments for optimal model serving.Contribute to the orchestration layer for model deployments.Build and enhance monitoring systems to track model performance metrics effectively.Develop efficient resource management solutions to optimize performance.

Mar 9, 2025

Apply

Infrastructure Software Engineer

Ivo

Full-time|On-site|San Francisco, California

Join Ivo's Engineering Team!At Ivo, we are pioneers in the tech industry. Our engineers are innovators who have created groundbreaking solutions such as:• An AI agent that seamlessly integrates with MS Word to enhance document editing [2023]• Revolutionizing embedding models with agentic RAG technology [2023]• Advanced LLM-based legal fact extraction capabilities [2024]• A legal assistant designed to search extensive contract databases without compromising accuracy [2024]• Clustering legal documents from the same lineage [2025]• Automatic deviation analysis to uncover hidden risks in vast contract databases [2025]• Merging contracts with their amendments to create a “composite” contract timeline that has moved our clients to tears [2025]Role OverviewAs an Infrastructure Engineer at Ivo, you will lay the groundwork for our platform's future. Your responsibilities will include:• Designing and owning the future of our infrastructure, allowing you the freedom to innovate.• Managing multiple customer deployments, ensuring each receives tailored containers, databases, and VPCs.• Instrumenting our systems to identify performance bottlenecks and errors.• Aggregating metrics and logs into visually appealing dashboards and setting up pager alerts.• Leading infrastructure-related incidents and being on-call as necessary.• Enhancing our CI/CD system to reduce deployment time from ~12 minutes.If you're passionate about LLMs, you'll thrive in our engineering team, where you’ll have the opportunity to:• Develop real-time LLM evaluations to monitor the accuracy of our responses.• Collaborate with talented engineers to push the boundaries of DevOps.

Nov 20, 2025

Apply

Software Engineer - Infrastructure

Astranis

Full-time|On-site|San Francisco

Astranis is seeking a talented and motivated Software Engineer to join our Infrastructure team. In this role, you will be at the forefront of developing and maintaining critical software systems that support our innovative satellite technology. You'll collaborate with cross-functional teams to design, implement, and optimize our infrastructure solutions, ensuring high reliability and performance.

Apr 9, 2026

Create account — see all 5,789 results

1 - 20 of 5,789 Jobs

Select all on this page (20)

Apply

Software Engineer in Compute Infrastructure

OpenAI

Full-time|On-site|San Francisco

Apr 27, 2026

Apply

Senior Software Engineer - Compute Infrastructure

Databricks