Infrastructure Software Engineer

ExaSan Francisco, California

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Desired ExperienceProven experience in designing and managing large-scale infrastructure, such as GPU clusters, extensive Kubernetes environments, or cloud-based batch job systems. A meticulous approach, consistently focused on reliability, observability, and optimization throughout the entire technology stack.

About the job

The Infrastructure Team at Exa is responsible for developing the essential tools and infrastructure that support our entire system. We are looking for talented infrastructure engineers to help us scale our capabilities rapidly. Your work could involve orchestrating GPU clusters with Kubernetes, implementing map-reduce batch jobs on Ray, or creating top-tier observability tools that set industry standards.

About Exa

Exa is dedicated to innovating the future of AI by developing an unparalleled search engine infrastructure that enhances performance and scalability. Our commitment to building a world-class engineering team is at the forefront of our endeavors.

Similar jobs

1 - 20 of 7,624 Jobs

Search for Software Engineer Frontier Ai Infrastructure

7,624 results

Select all on this page (20)

Apply

Software Engineer, Frontier AI Infrastructure

Scale AI

Full-time|$138K/yr - $259.4K/yr|On-site|San Francisco, CA; St. Louis, MO; New York, NY; Washington, DC

Scale AI is on the lookout for an exceptionally talented and driven Software Engineer, Frontier AI Infrastructure to become an integral part of our innovative Public Sector Engineering team. In this role, you will take charge of the model inference layer, enabling cutting-edge AI models, troubleshooting the latest AI tools, managing networking tasks, addressing latency issues, and monitoring pricing and usage metrics for AI models. You will spearhead technical discussions with cloud vendors and clients to fulfill critical contracts and resolve platform challenges. Additionally, you will collaborate closely with Product teams to anticipate feature requirements, transitioning from reactive 'infra-only debugging' to proactive integration testing.Your Responsibilities Include:Designing and implementing secure, scalable backend systems tailored for Public Sector clients, utilizing Scale's advanced cloud-native AI infrastructure.Owning services or systems while defining long-term health objectives and enhancing the health of related components.Redesigning the architecture to operate in compliant or restrictive environments, which entails creating swappable components (authentication, storage, logging) to adhere to government and security regulations without compromising product integrity.Collaborating with Product teams to develop integration tests that identify issues early, shifting focus from 'infra-only debugging' to preventing upstream failures.Actively participating in customer engagements, liaising with stakeholders to comprehend requirements and deliver innovative solutions.Contributing to the platform roadmap and product strategy for Scale AI's Public Sector division, playing a vital role in shaping the future trajectory of our offerings.

Mar 26, 2026

Apply

Software Engineer, Frontier Systems

OpenAI

Full-time|On-site|San Francisco

About Our TeamThe Frontier Systems team at OpenAI is at the forefront of technology, responsible for creating, deploying, and maintaining some of the world's largest supercomputers. These supercomputers are pivotal for training our most advanced AI models, pushing the boundaries of innovation.We transform sophisticated data center designs into operational systems and develop the software infrastructure necessary for extensive frontier model training. Our goal is to ensure these hyperscale supercomputers operate reliably and efficiently, supporting groundbreaking AI research.About the RoleAs a key member of the Frontier Systems team, you will be instrumental in designing the critical infrastructure that ensures our supercomputers function seamlessly for pioneering AI research. In this role, you'll address system-level challenges and implement automation solutions that minimize disruptions during large-scale training processes.Your responsibilities will encompass end-to-end ownership of your projects, allowing you to make significant contributions to our mission. This position is ideal for individuals who excel in diagnosing complex system issues and crafting automation strategies to proactively resolve problems across a vast network of machines.Your Responsibilities Include:Enhancing system health checks to maintain the stability of our hyperscale supercomputers during model training.Conducting in-depth investigations into hardware failures and system-level bugs to uncover root causes.Developing automation tools that monitor and resolve issues across thousands of systems, enabling uninterrupted research progress.You May Be a Great Fit If You Possess:7+ years of hands-on experience in software engineering.Strong proficiency in Python and shell scripting.Expertise in analyzing complex data sets using SQL, PromQL, Pandas, or other relevant tools.Experience in creating reproducible analyses.A solid balance of skills in both building and operationalizing systems.Prior experience with hardware is not a prerequisite for this role.Preferred Qualifications:Familiarity with the intricacies of hardware components, protocols, and Linux tools (e.g., PCIe, Infiniband, networking, power management, kernel performance tuning).Experience with system optimization and performance tuning.

May 9, 2025

Apply

Software Engineer, Frontier Clusters Infrastructure

OpenAI

Full-time|On-site|San Francisco

About the TeamJoin the innovative Frontier Systems team at OpenAI, where we design, implement, and maintain the world's largest supercomputers, essential for advancing our most groundbreaking model training initiatives.We transform data center blueprints into operational systems while crafting the software necessary for executing large-scale frontier model trainings.Our mission is to establish, stabilize, and ensure the reliability and efficiency of these hyperscale supercomputers throughout the training of our frontier models.About the RoleWe are seeking passionate engineers to manage the next generation of compute clusters that underpin OpenAI’s frontier research.This position merges distributed systems engineering with practical infrastructure work across our expansive data centers. You will scale Kubernetes clusters to unprecedented levels, automate bare-metal setups, and create the software layer that simplifies the complexity of numerous nodes across various data centers.Your work will be at the crossroads of hardware and software, where speed and reliability are paramount. Be prepared to oversee dynamic operations, swiftly identify and resolve pressing issues, and constantly elevate the standards for automation and uptime.In this role, you will:Provision and scale extensive Kubernetes clusters, including automation for deployment, bootstrapping, and lifecycle managementCreate software abstractions that integrate multiple clusters and provide a cohesive interface for training workloadsOversee node deployment from bare metal to firmware upgrades, ensuring rapid, repeatable setups at scaleEnhance operational metrics by reducing cluster restart times (e.g., from hours to minutes) and expediting firmware and OS upgrade cyclesIntegrate networking and hardware health systems to ensure end-to-end reliability across servers, switches, and data center infrastructureDevelop monitoring and observability systems to identify issues early and maintain cluster stability under high loadsYou might thrive in this role if you:Have extensive experience operating or scaling Kubernetes clusters or similar container orchestration systems in high-growth or hyperscale environmentsPossess strong programming skills in languages relevant to cloud and infrastructure management

Nov 7, 2024

Apply

Software Engineer - AI Infrastructure

Andromeda Cluster

Full-time|Remote|North America Remote / San Francisco, CA

Join Our Team as a Software Engineer - AI InfrastructureLocation: North America Remote / San Francisco · Full-TimeAt Andromeda Cluster, we are dedicated to democratizing access to advanced AI infrastructure that was once only available to hyperscalers. Founded by industry leaders Nat Friedman and Daniel Gross, we have evolved from a singular managed cluster to a global platform that connects top AI labs, data centers, and cloud providers around the world. Our orchestration layer efficiently manages training and inference tasks globally, enhancing flexibility and efficiency in this rapidly expanding sector. We aim to create a global marketplace for AI computing, empowering AGI with the same fluidity as global financial markets.As we continue to grow, we are on the lookout for talented individuals in the fields of AI infrastructure, research, and engineering.Your RoleIn the position of Infrastructure Product Engineer, you will be integral in constructing the foundational framework of Andromeda’s platform. Your challenge will be to simplify complex, real-world infrastructure issues into scalable product solutions that our customers will benefit from.Key ResponsibilitiesArchitect and develop essential platform components, focusing on infrastructure orchestration, provisioning, and lifecycle management solutions.Create robust APIs, services, and control planes that abstract diverse infrastructure types, including VMs, Kubernetes, bare metal, and schedulers.Convert customer usage patterns into actionable product requirements, delivering impactful features and enhancements.Design automation and internal tools to mitigate manual and ad-hoc operational tasks.Improve platform reliability, performance, and observability, focusing on sustainable enhancements rather than quick fixes.Collaborate with other teams to establish clear ownership boundaries between platform features and customer-specific solutions.Write clean, maintainable, and well-documented code with a focus on long-term sustainability.Engage in technical design discussions and contribute to the architectural advancements of our platform.

Feb 18, 2026

Apply

Senior Software Engineer, Infrastructure at Retell AI | San Francisco

Retell AI

Full-time|On-site|San Francisco Bay Area

Join the Revolution at Retell AIRetell AI is pioneering the future of call centers through innovative voice AI, driven by first principles thinking.In just 18 months since our inception, we have empowered thousands of businesses with our AI voice agents, transforming how sales, support, and logistics calls are managed—previously requiring extensive human teams. Supported by prestigious investors such as Y Combinator and Alt Capital, we've rapidly scaled from $5M ARR to an impressive $36M ARR with a compact yet dynamic team of 20.Our ambition for 2026 is to create a revolutionary customer experience platform, where entire contact centers are powered by AI. Moving beyond basic automation, we aim to develop intelligent AI “workers” that serve as frontline agents, QA analysts, and managers, continuously enhancing customer interactions without the need for constant human oversight.As we expand, we are seeking passionate engineers who are eager to solve challenging technical problems, act swiftly, and make a significant impact in one of the fastest-growing voice AI startups. Let’s shape the future together.

Aug 12, 2025

Apply

Software Engineer for AI Infrastructure

Eventual Computing

Full-time|On-site|San Francisco

About EventualAt Eventual, we are reimagining how AI applications process vast amounts of data, from images to complex datasets. Traditional data platforms are not equipped to handle the petabytes of multimodal data essential for AI, causing teams to struggle with inadequate infrastructure. Founded in 2022, our mission is to simplify data querying, making it as intuitive as working with tables while ensuring scalability for production workloads.Our open-source engine, Daft, is specifically designed for real-world AI systems. It efficiently manages external APIs, GPU clusters, and addresses failures that traditional engines cannot handle. Daft is already integral to operations at leading companies such as Amazon, Mobileye, Together AI, and CloudKitchens.We pride ourselves on our exceptional team, which includes talents from Databricks, AWS, Nvidia, Pinecone, GitHub Copilot, Tesla, and others. We have quadrupled our team size in just a year, supported by Series A and seed funding from notable investors like Felicis, CRV, Microsoft M12, and Y Combinator. We are now eager to expand further. Join us—Eventual is just getting started.We are seeking passionate individuals who are excited to collaborate in a close-knit team environment, working together four days a week in our San Francisco Mission district office.Your Role:As a Software Engineer, you will take charge of developing Eventual's core products and architecture. You’ll deliver features that our customers will use immediately and collaborate with a dedicated team that values open communication and cross-functional teamwork. Our fast-paced environment is focused on solving a variety of complex technical and product challenges. While our experienced team is here to provide guidance and mentorship, we appreciate engineers who can independently identify and tackle challenging technical issues.Key Responsibilities:Design and develop highly reliable and resilient products and features.Collaborate closely with cross-functional product and customer-facing teams to understand requirements and deliver thoughtful solutions.Write high-quality, extensible, and maintainable code.Create and build scalable applications and components.Architect and manage Kubernetes clusters optimized for our needs.

Sep 22, 2025

Apply

Software Engineer, Infrastructure

Sierra

Full-time|On-site|San Francisco, CA

About UsAt Sierra, we are revolutionizing the way businesses engage with their customers by building a cutting-edge platform that harnesses the power of AI. Our headquarters is located in the vibrant city of San Francisco, with additional offices expanding in Atlanta, New York, London, France, Singapore, and Japan.Our company culture is deeply rooted in our core values: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These principles guide our actions and foster an environment where innovation thrives.Sierra was co-founded by visionary leaders Bret Taylor, who currently serves as the Board Chair of OpenAI and has a rich history with Salesforce and Facebook, and Clay Bavor, who previously led Google Labs and spearheaded initiatives like Google Lens and Project Starline.Your RoleAs a Software Engineer focusing on Infrastructure at Sierra, you will play a pivotal role in designing, constructing, and maintaining the foundational systems that empower our AI platform. Your expertise will ensure that our infrastructure is not only secure and reliable but also scalable, allowing product teams to execute their work with agility and confidence.Guarantee the reliability, scalability, and performance of our platform and LLM inference serving in response to increasing traffic demands.Develop and oversee cloud infrastructure using Terraform to create secure, scalable, and reproducible environments.Establish and manage a self-service infrastructure platform to empower engineering teams in deploying and operating services independently.Take ownership of and improve CI/CD pipelines and release management processes, facilitating rapid and reliable deployments across Sierra’s platform.Design and manage distributed systems utilizing distributed databases, retrieval systems, and machine learning models.Develop and sustain core data serving abstractions along with essential authentication and security features (SSO, RBAC, authentication controls).Effectively navigate and integrate our technology stack with enterprise customer environments in a scalable and maintainable manner.

Oct 15, 2025

Apply

Infrastructure Software Engineer

Exa

Full-time|On-site|San Francisco, California

At Exa, we are on a mission to create a cutting-edge search engine from the ground up, designed to cater to the diverse needs of AI applications. Our team is building a robust infrastructure that enables us to crawl the internet, train advanced embedding models for indexing, and develop high-performance vector databases using Rust. Additionally, we manage a significant $5M H200 GPU cluster that powers tens of thousands of machines.The Infrastructure Team at Exa is responsible for developing the essential tools and infrastructure that support our entire system. We are looking for talented infrastructure engineers to help us scale our capabilities rapidly. Your work could involve orchestrating GPU clusters with Kubernetes, implementing map-reduce batch jobs on Ray, or creating top-tier observability tools that set industry standards.

Sep 3, 2025

Apply

Infrastructure Software Engineer - Enterprise Generative AI

Scale AI

Full-time|$216.2K/yr - $270.3K/yr|On-site|San Francisco, CA; New York, NY

Join Scale AI's innovative team as an Infrastructure Software Engineer for our Enterprise Generative AI Platform (SGP). In this dynamic role, you will help design and enhance our enterprise-grade AI platform, which offers robust APIs for knowledge retrieval, inference, evaluation, and more. We're seeking an exceptional engineer who thrives in fast-paced environments and is eager to contribute to the scaling of our core infrastructure. The ideal candidate will possess a solid foundation in software engineering principles and extensive experience with large-scale distributed systems. Your role will involve implementing solutions across various cloud providers (GCP, Azure, AWS) for clients in highly regulated sectors, including healthcare, telecommunications, finance, and retail.

Mar 26, 2026

Apply

Software Engineer - Infrastructure

Baseten

Full-time|$300K/yr - $300K/yr|On-site|San Francisco

ABOUT BASETENJoin Baseten, where we drive mission-critical AI inference for leading companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our unique blend of applied AI research, robust infrastructure, and intuitive developer tools empowers organizations at the forefront of AI innovation to deploy state-of-the-art models into production. Recently, we secured a $300M Series E funding round, backed by esteemed investors such as BOND, IVP, Spark Capital, Greylock, and Conviction. Be a part of our rapid growth and help shape the platform that engineers trust for launching AI products.THE ROLEAs an Infrastructure Software Engineer at Baseten, you will play a pivotal role in developing and maintaining our ML inference platform that powers AI applications in production. Your contributions will enhance the core infrastructure, enabling developers to deploy, scale, and monitor machine learning models with exceptional performance.EXAMPLE INITIATIVESYou will engage in innovative projects within our Infrastructure team, including:Multi-cloud capacity managementInference on B200 GPUsMulti-node inferenceFractional H100 GPUs for efficient model servingRESPONSIBILITIESDesign and develop infrastructure components for our ML inference platform, primarily using Python and Go.Implement and maintain Kubernetes deployments for optimal model serving.Contribute to the orchestration layer for model deployments.Build and enhance monitoring systems to track model performance metrics effectively.Develop efficient resource management solutions to optimize performance.

Mar 9, 2025

Apply

Senior Software Engineer, Infrastructure

Serval

Full-time|On-site|San Francisco

Who We AreServal is an innovative AI-driven automation platform redefining operational efficiency for enterprises. Our intelligent agents seamlessly comprehend and execute real-world workflows, replacing outdated manual processes with adaptive, self-learning software. Since our inception in early 2024, we have garnered the trust of industry leaders such as General Motors, Notion, Perplexity, Vercel, Mercor, LangChain, and Verkada, streamlining high-volume operational tasks across their organizations.At the heart of Serval is a cutting-edge agentic AI platform that transforms natural language into actionable workflows. Our agents not only respond to queries but also reason, act across various systems, and continuously enhance their performance. What started as a solution for operational tasks has rapidly expanded into a versatile AI automation layer utilized across IT, HR, Finance, Security, Legal, and Engineering sectors.Our mission is to eradicate repetitive, manual tasks within enterprises, empowering teams through intelligent automation. In the long run, we aim to establish a universal AI operations layer—a system of agents that integrates across business functions, maintaining the momentum of modern companies.We are proud to be backed by renowned investors including Sequoia Capital, Redpoint Ventures, Meritech, First Round, General Catalyst, and Elad Gil, and founded by seasoned product and engineering leaders from Verkada.Role OverviewAs a Senior Software Engineer in Infrastructure at Serval, you will be pivotal in developing and scaling the core systems that empower our AI agents and workflow automation platform. A crucial aspect of this role involves enabling and supporting self-hosted deployments for enterprise clients needing on-premises or private cloud environments. We are looking for engineers with profound expertise in distributed systems, infrastructure-as-code, production operations, and customer-facing support, who aspire to influence the technical architecture of a rapidly evolving platform.What You'll DoDesign, implement, and operate large-scale distributed systems that power Serval's AI agents, workflow orchestration, and data pipelines.Create and maintain Terraform modules to provision and manage cloud infrastructure across AWS, GCP, or Azure environments.Develop and sustain deployment packages, installation scripts, and infrastructure templates, enabling customers to self-host Serval in their own environments.Provide technical support and guidance to enterprise customers during installation and deployment phases.

Jan 29, 2026

Apply

Infrastructure Software Engineer

Ivo

Full-time|On-site|San Francisco, California

Join Ivo's Engineering Team!At Ivo, we are pioneers in the tech industry. Our engineers are innovators who have created groundbreaking solutions such as:• An AI agent that seamlessly integrates with MS Word to enhance document editing [2023]• Revolutionizing embedding models with agentic RAG technology [2023]• Advanced LLM-based legal fact extraction capabilities [2024]• A legal assistant designed to search extensive contract databases without compromising accuracy [2024]• Clustering legal documents from the same lineage [2025]• Automatic deviation analysis to uncover hidden risks in vast contract databases [2025]• Merging contracts with their amendments to create a “composite” contract timeline that has moved our clients to tears [2025]Role OverviewAs an Infrastructure Engineer at Ivo, you will lay the groundwork for our platform's future. Your responsibilities will include:• Designing and owning the future of our infrastructure, allowing you the freedom to innovate.• Managing multiple customer deployments, ensuring each receives tailored containers, databases, and VPCs.• Instrumenting our systems to identify performance bottlenecks and errors.• Aggregating metrics and logs into visually appealing dashboards and setting up pager alerts.• Leading infrastructure-related incidents and being on-call as necessary.• Enhancing our CI/CD system to reduce deployment time from ~12 minutes.If you're passionate about LLMs, you'll thrive in our engineering team, where you’ll have the opportunity to:• Develop real-time LLM evaluations to monitor the accuracy of our responses.• Collaborate with talented engineers to push the boundaries of DevOps.

Nov 20, 2025

Apply

Backend & Infrastructure Software Engineer

vooma

Full-time|On-site|San Francisco Office

About the RoleJoin our pioneering team at vooma as a Backend & Infrastructure Software Engineer, where you will play a critical role in shaping the technical infrastructure of a transformative company.If you are passionate about creating not only resilient systems but also the foundational architecture of a groundbreaking enterprise from the outset, this position is ideal for you.We are looking for someone who excels at crafting infrastructure that is elegant, dependable, and secure, even under high-demand scenarios. You thrive on the challenge of scaling systems that enable intelligent agents and take pride in establishing reliable foundations that others can rely on.Your Key Responsibilities Include:Design and maintain secure, scalable infrastructure tailored for AI-powered agents in production environments.Deploy and optimize AI-driven services to meet high availability and performance standards.Manage infrastructure as code, alongside cloud environments and CI/CD pipelines.Implement monitoring, observability, and alerting systems to ensure the reliability of our infrastructure.Contribute to infrastructure security and adhere to best practices.You Should Have:Experience in deploying and productionizing machine learning or AI-centric workloads.Proficiency in developing secure, scalable infrastructures on platforms such as AWS, Azure, or GCP.In-depth knowledge of backend systems, networking, and container orchestration technologies (e.g., Kubernetes).Understanding of infrastructure security principles and compliance standards (e.g., SOC2).A proactive and hands-on mindset, with a strong drive to solve challenges from start to finish.

Jul 1, 2025

Apply

Software Engineer, Infrastructure (All Levels) at Rad AI | San Francisco

Rad AI

Full-time|On-site|San Francisco

About Rad AIAt Rad AI, we are dedicated to revolutionizing the healthcare landscape through the power of artificial intelligence. Established by a radiologist, our innovative AI solutions are transforming radiology, streamlining processes, alleviating physician burnout, and enhancing patient outcomes. With access to one of the largest proprietary datasets of radiology reports globally, our AI has been instrumental in identifying hundreds of new cancer cases and has reduced error rates in millions of radiology reports by nearly 50%.Having secured over $140 million in funding, including a highly successful Series C round of $68 million led by Transformation Capital, we now boast a valuation of $528 million. Our esteemed investors include Khosla Ventures, World Innovation Lab, Gradient Ventures, and Cone Health Ventures, all united in our vision to empower physicians with state-of-the-art AI technology.Our latest advancements in generative AI are utilized daily by thousands of radiologists, supporting over one-third of radiology groups and healthcare systems, and nearly 50% of all medical imaging in the U.S., partnering with distinguished institutions such as Cone Health, Jefferson Einstein Health, Geisinger, Guthrie Healthcare System, and Henry Ford Health.Recognized as one of the most promising healthcare AI companies by CB Insights and AuntMinnie, and ranked by Deloitte as the 19th fastest-growing company in North America, we are committed to building AI-powered solutions that make a significant difference. Recently, Rad AI was honored to be included in CNBC’s Disruptor 50 list, underscoring the innovation and momentum driving our mission.If you’re eager to help shape the future of healthcare, we invite you to join our dynamic team!Why Join UsThe Platform Engineering team at Rad AI lays the groundwork that sustains all our products—Reporting, Impressions, and Continuity—enabling product teams to deliver reliably, securely, and at scale. Within this framework, the Infrastructure team is responsible for our core cloud infrastructure, platform stability, and reliability practices. We are on the lookout for multiple Infrastructure Engineers to assist us in designing and operating robust, scalable systems. In this role, you will play a key part in infrastructure architecture, reliability practices, and the thoughtful enhancement of our workflows. If you have a passion for building resilient systems, we want to hear from you!

Feb 6, 2026

Apply

Product Infrastructure Software Engineer

Netic

Full-time|On-site|San Francisco

Netic is revolutionizing the essential services sector with our AI-driven revenue engine, empowering the backbone of the American economy.With $43M in funding from leading investors such as Founders Fund, Greylock, Hanabi, and Dylan Field, who spearheaded our Series B, we have enabled our clients to secure hundreds of thousands of jobs across various service industries in North America. Today, numerous companies thrive entirely on an AI-first model powered by Netic.As a member of our team consisting of innovative builders from top organizations such as Scale, Databricks, HRT, Meta, MIT, Stanford, and Harvard, you will be at the forefront of integrating frontier AI into the physical economy, where challenges are complex, data is intricate, and impacts are immediate and substantial.In the role of a founding Product Infrastructure Engineer, you will design and scale the crucial infrastructure that supports our autonomous AI agents, addressing real-world challenges with significant, tangible outcomes. You will work alongside a passionate team of builders to develop infrastructure and processes from scratch, utilizing state-of-the-art cloud and orchestration technologies. If you excel in dynamic, ambiguous settings and are eager to set new benchmarks in the agentic domain, this is your chance to make a lasting impact.

May 30, 2025

Apply

Infrastructure Software Engineer

ChaiDiscovery

Full-time|On-site|San Francisco office

About Chai DiscoveryChai Discovery specializes in developing cutting-edge AI models that revolutionize molecular design and redefine drug discovery processes. Our passionate team is dedicated to transforming the search for new cures and improving lives.Our founding team comprises top researchers and Silicon Valley experts, having achieved significant milestones in AI for biology. With a history of co-inventing protein language modeling and creating advanced folding algorithms, our technology has been embraced by leading pharmaceutical companies. We are proud to be supported by prestigious investors including OpenAI, Thrive Capital, Dimension, Conviction, Lachy Groom, Amplify, and others.About the RoleWe are seeking a dedicated Infrastructure Software Engineer focused on crafting robust, streamlined infrastructure solutions. You will develop the foundational compute and infrastructure systems that support our product offerings, model inference processes, and evaluation frameworks. Collaboration with product engineers, researchers, and our commercial team will be key to your success.You will have experience creating services that developers appreciate, successfully deploying and scaling AI/ML systems in production, and effectively anticipating potential challenges that may hinder the adoption of our platform by leading biopharmaceutical organizations.As Chai's models advance from protein structure prediction into practical therapeutic engineering, this role presents a unique opportunity to bring state-of-the-art AI drug design models to market, working alongside a team that is both detail-oriented and optimistic about the future.About YouYou are motivated by a mission to establish the benchmark for impactful AI technology. We are looking for candidates who possess:Software Experience:A Bachelor’s degree or equivalent experience in Computer Science or a related field.5+ years of experience in building production systems utilizing contemporary tools, collaborating with platform, security, and product teams.A keen ability to foresee infrastructure challenges.Comprehensive ownership of 24/7 infrastructure observability, alerting, and incident response.Experience in both 0-to-1 buildouts and 1-to-n scale-ups, along with a rich repository of best practices and strategies.Communication & Collaboration:A passion for code pair-reviewing, documentation, and knowledge sharing with peers.

Nov 25, 2025

Apply

Infrastructure Software Engineer

Blockit

Full-time|On-site|San Francisco

About BlockitAt Blockit, we recognize that time is our most precious resource, yet the art of scheduling often feels antiquated. Our mission is to revolutionize this process through advanced AI technology that acts as an autonomous time agent, adeptly managing the complexities of scheduling—including time zones, group coordination, and logistical considerations—as though it were an ever-vigilant executive assistant.As pioneers in the AI space, Blockit is at the forefront of developing one of the first multiplayer, stateful AI agents capable of facilitating interactions among multiple users, maintaining contextual continuity across conversations, and executing real-world actions. The more users integrate their calendars, the more robust our network becomes.Join our dynamic team, supported by Sequoia, where we maintain a fast-paced environment, consistently ship innovative solutions, and uphold high standards of excellence. If you’re excited about building groundbreaking technology, we would love to connect with you.To explore our team culture further, please visit our team page.The RoleIn this role, you will ensure that Blockit remains fast, reliable, and primed for scalability.You will take ownership of our core infrastructure, which includes databases, asynchronous job processing, observability, and the systems that drive our AI agents, including the LLM gateway. You will architect solutions as we expand, whether that means integrating new systems or innovating entirely new approaches. Furthermore, you will be the go-to person for reliability and performance, ensuring our systems remain robust as usage increases.This position is perfect for someone who is passionate about operational excellence and eager to lay the groundwork for a platform that orchestrates millions of calendars.What You’ll DoManage and evolve our core infrastructure, including PostgreSQL, Clickhouse, and asynchronous processing pipelines.Design and optimize our LLM infrastructure, which encompasses the LLM gateway, evaluation pipelines, and observability stack, to guarantee reliability, performance, and cost-effectiveness.Develop comprehensive monitoring, alerting, and dashboard solutions to promptly identify issues.Architect and implement new infrastructure as we scale, such as Redis, Kafka, or similar systems, making informed trade-offs along the way.Enhance deployment pipelines and developer experiences to maintain rapid and safe shipping of updates.

Jan 21, 2026

Apply

Infrastructure Software Engineer

xdof

Full-time|Hybrid|San Francisco Hybrid

Join xdof, where innovation meets opportunity! As we stand at the forefront of robotics and AI technology, we are dedicated to addressing the critical need for high-quality training data. Our mission is to develop sophisticated data collection systems, operational capabilities, and expansive data warehouses that empower our partners to lead the field.As an Infrastructure Engineer, you will be instrumental in creating a robust platform that supports our growing data collection initiatives.Key projects you may work on include:Developing an orchestration system for processing data upon ingestion.Designing an internal platform that allows researchers to experiment with our datasets.Managing a multi-tenant data lake to enhance data accessibility and collaboration.

Dec 10, 2025

Apply

Infrastructure Software Engineer

doppel

Full-time|On-site|San Francisco

Why Join Doppel?At Doppel, we are dedicated to tackling one of the most significant threats posed by AI: mass-manufactured social engineering. With scams, deepfakes, and social engineering attacks proliferating across digital platforms such as websites, social media, advertisements, encrypted messaging apps, and mobile devices, our mission is both simple and ambitious: to enhance internet safety by outsmarting the fastest-evolving digital threats.Supported by renowned investors like a16z and Bessemer, and trusted by industry leaders such as OpenAI, United Airlines, and Coinbase, Doppel is on a rapid growth trajectory. If you are passionate about addressing real-world challenges through innovative technology, we want to hear from you!What We're BuildingWe are developing an AI-driven platform to combat social engineering on a large scale. This involves creating scalable systems that monitor billions of domains, social media accounts, applications, and dark web forums, utilizing AI agents to detect and neutralize digital threats effectively.What We're Looking ForWe are in search of a skilled backend engineer to enhance the infrastructure needed for our rapidly expanding engineering team. Recent projects include:Developed a self-hosted Elasticsearch infrastructure on Kubernetes, facilitating real-time search capabilities across millions of alerts and associated metadata.Established core infrastructure using Terraform (Infrastructure as Code), enabling reproducible, version-controlled environments and expediting onboarding for new engineers.Implemented a dedicated staging environment, which enhances safety during releases, feature validation, and automated integration testing prior to production deployments.Introduced observability and tracing mechanisms (metrics, logging, distributed tracing), significantly improving our capacity to debug performance issues and sustain reliability at scale.What We Offer A mission-driven culture emphasizing low ego, high accountability, deep customer focus, and exceptional talent density. Complimentary lunch and dinner in the office. Flexible Paid Time Off (PTO). Quarterly team offsites.

Sep 12, 2025

Apply

Software Engineer, Agent Infrastructure

Netic

Full-time|On-site|San Francisco

Join Netic, the cutting-edge AI revenue engine powering essential services that form the backbone of the American economy.Backed by $43 million in funding from top investors like Founders Fund, Greylock, Hanabi, and Dylan Field, we have empowered our clients to secure hundreds of thousands of jobs across various service industries in North America. As a pioneer in AI-driven solutions, we are witnessing the emergence of companies operating entirely on our AI-first platform.As an Agent Infrastructure Engineer, you will be at the forefront of architecting and scaling the core framework that underpins our autonomous AI agents, addressing complex real-world challenges with immediate and significant impacts. Collaborate with a passionate team of innovators from renowned companies such as Scale, Databricks, HRT, Meta, MIT, Stanford, and Harvard, as we bring frontier AI to the physical economy where the stakes are high, and the data is intricate.If you thrive in dynamic, fast-paced environments and are eager to set new benchmarks in the agentic space, seize this opportunity to make your mark!

Aug 15, 2025

Create account — see all 7,624 results