Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
Key ResponsibilitiesEnhance and support our core Python platform responsible for request routing, AI workload orchestration, GPU server capacity management, observability, authentication, rate limiting, and more. Manage our infrastructure layer utilizing Terraform, Ansible, and provider APIs to oversee our fleet of GPU workers. Take ownership of technologies such as K8s, FluxCD, Nomad, Prometheus, Thanos, Grafana, Loki, distributed networking storage, and other foundational elements of our platform. Formulate a vision and strategic roadmap for our infrastructure development over the next 1, 2, and 5 years.
About the job
Join our innovative team at fal as a Staff Software Engineer specializing in large-scale computation platforms. We are seeking a seasoned software engineer with extensive experience in developing backend systems that efficiently orchestrate workloads and manage resource constraints. Your expertise in foundational cloud infrastructure and Linux provisioning will be crucial as you work towards achieving high reliability and scalability with minimal operational overhead.
About fal
fal is at the forefront of computational technology, dedicated to innovating and optimizing large-scale computation platforms. We value creativity and ambition, offering our team members the resources and opportunities to grow and excel in a dynamic work environment.
Similar jobs
1 - 20 of 5,995 Jobs
Search for Staff Software Engineer Database Infrastructure
Full-time|$200K/yr - $270K/yr|On-site|Denver, CO;San Francisco, CA;New York, NY;Los Angeles, CA;Seattle, WA
About GustoAt Gusto, we are dedicated to empowering small businesses by managing essential services like payroll, health insurance, 401(k)s, and HR, allowing owners to focus on their passions and customers. With offices in Denver, San Francisco, and New York, we proudly support over 400,000 small businesses nationwide, fostering a workplace that reflects and celebrates the diverse customers we serve. Explore our Total Rewards philosophy. About the Role:We are seeking a seasoned engineer with extensive knowledge in distributed data systems to help shape the future of Gusto's storage architecture. In this impactful role, you will oversee intricate migrations, design high-scale systems, and establish benchmarks for automation, resilience, and security. Your work in implementing distributed database solutions will facilitate Gusto's ongoing growth and scalability.About the Team:The Datastores Infrastructure Engineering team is responsible for designing, building, and maintaining the data platforms that drive Gusto's products, including MySQL, Postgres, Redis, Kafka, and S3. We are committed to ensuring that our infrastructure is consistent, dependable, and equipped to support Gusto's expanding requirements. As we transition to self-hosted distributed databases, our focus lies in minimizing the blast radius, enhancing operational resilience, and enabling sustainable scalability.Here’s what you’ll do day-to-day:Architect, deploy, and manage the complete lifecycle of distributed database systems (TiDB) on Kubernetes at scale, ensuring high availability, data consistency, and operational excellence.Coordinate complex, zero-downtime migrations from monolithic to distributed architectures, including vertical sharding to isolate Product Services.Define and implement efficiency enhancements across the storage infrastructure through query optimization, caching strategies, and workload management.Establish standards and develop reliable automation to maintain data consistency, integrity, and security across distributed systems.Continuously enhance operational excellence by decreasing on-call burdens with sustainable, long-term solutions.Collaborate with product engineering teams and technical partners to enable rapid and reliable product development.
Full-time|$160K/yr - $180K/yr|On-site|San Francisco Bay Area
Join Discord, a platform embraced by over 200 million users monthly, where gaming is at the heart of our community. With more than 90% of our users engaging in gaming, they collectively spend 1.5 billion hours exploring thousands of unique titles each month. Our mission is to enhance the gaming experience by facilitating seamless communication and interaction among players.The Database Infrastructure team is responsible for the development and management of all database systems and data services at Discord. These systems are critical for supporting our vast user base, which includes trillions of messages exchanged each month. Our small yet impactful team works across various domains, including databases, disk storage, and Rust-based data access services, playing a pivotal role in the company's growth and success!Explore our team's insights through our blog posts:How Discord Indexes Trillions of MessagesHow Discord Stores Trillions of MessagesHow Discord Supercharges Network Disks for Extreme Low-Latency
Full-time|$180K/yr - $247.5K/yr|Remote|San Francisco or Remote
Join the Revolution at CheckAt Check, we are transforming the payroll landscape. Our mission goes beyond just building a successful business; we collaborate with our partners to innovate payroll solutions. As pioneers of embedded payroll, we are reshaping the payment process, enabling payroll businesses to launch, expand, and succeed with ease. Discover our journey | Listen in.Check is more than an API; we are the catalyst for developing and scaling payroll operations.Our TeamThe payroll system is in dire need of innovation. We invite you to join a passionate team dedicated to making an impactful change! At Check, you will leverage creative problem-solving and critical thinking to influence every business we partner with. We view challenges as opportunities for improvement, valuing the unique contributions of each team member in our collective mission.If you're ready to dive in and transform payroll, let's collaborate to simplify complexity and enhance the future for businesses of all sizes.Your RoleAt Check, engineering is our foundation. We believe that payroll should resemble modern financial software; achieving this requires a comprehensive understanding of systems and reliable infrastructure that our partners can trust. Every product we deliver relies on scalable and secure systems that ensure timely payments and payroll processing.We are seeking a Staff Software Engineer who possesses strong software design capabilities coupled with hands-on infrastructure experience. In this position, you will focus on the essential systems that drive payroll operations, enhancing our service scalability, production operations, and empowering engineers with the tools to deliver software confidently and securely.You will collaborate across product and platform areas to enhance our cloud infrastructure, fortify our deployment and monitoring strategies, and streamline the architecture that supports embedded payroll services. The challenges you will address often intersect infrastructure, product, and operational domains.This opportunity is perfect for someone who has managed complex systems end-to-end in a dynamic environment and takes pride in developing resilient, comprehensible infrastructure that is vital to our operations.
Full-time|$192K/yr - $260K/yr|On-site|San Francisco, California
P-188 At Databricks, we are on a mission to revolutionize the data lifecycle—from ingestion and ETL to business intelligence and advanced machine learning. Our vision is centered around a unified platform that replaces the conventional data warehouse architecture with a cutting-edge Lakehouse model (CIDR 2021 paper). This innovative architecture aims to tackle significant challenges such as data staleness, reliability, total cost of ownership, data lock-in, and limited support for diverse use cases. A pivotal element of achieving this vision is the development of the next generation of decoupled query engines and structured storage systems that can surpass the performance of specialized data warehouses while retaining the versatility of general-purpose systems like Apache Spark™. This capability is essential for supporting a wide range of workloads, from ETL processes to complex data science applications. As a key member of this team, you will engage in one or more of the following areas to design and implement systems that set new standards in the industry: Query compilation and optimization Distributed query execution and scheduling Vectorized execution engine Data security Resource management Transaction coordination Efficient storage structures (encodings, indexes) Automatic physical data optimization
Full-time|Remote|San Francisco, CA | New York City, NY | Seattle, WA
Join Anthropic as a Staff+ Software Engineer focusing on Databases, where you'll be at the forefront of our innovative technology solutions. You'll work closely with a collaborative team to design, implement, and maintain robust database systems that empower our AI models and enhance user experience. Your expertise will contribute significantly to our mission of advancing AI safety and usability.
Full-time|On-site|San Francisco, CA | New York City, NY | Seattle, WA
Anthropic is hiring a Staff Software Engineer to focus on Node Infrastructure. This position is based in San Francisco, New York City, or Seattle. Role overview This role centers on designing, building, and maintaining the core systems that support Anthropic’s services. The work directly affects the reliability and scalability of the company’s AI offerings. Collaboration Work closely with a skilled engineering team to develop infrastructure that supports high-quality AI solutions. The team values input and hands-on problem solving from every member. Impact Efforts in this role help ensure Anthropic’s services remain stable and can grow as demand increases. The systems you help create will play a key part in the company’s ability to deliver dependable AI products.
Full-time|$200K/yr - $275K/yr|On-site|San Francisco
About Watney RoboticsAt Watney Robotics, we are pioneers in developing autonomous robotic solutions aimed at enhancing critical infrastructure. Recently securing $21 million in seed funding from leading investors such as Conviction, Abstract, and A*, we are collaborating with the world’s largest hyperscalers to propel the expansion of data centers and streamline maintenance processes.This is an extraordinary opportunity to join our team at a pivotal stage as we transition from prototype to large-scale production. Be part of a team that not only ships cutting-edge systems but also plays a crucial role in shaping the operational framework of an innovative robotics company.
Full-time|$236K/yr - $290K/yr|On-site|San Francisco
Harvey builds a secure, enterprise-grade platform for legal and professional services, powered by advanced agentic AI. The company serves more than 1,000 clients in over 60 countries and is backed by top investors. Harvey’s team emphasizes speed, ownership, and high standards, working closely with customers to address real-world needs. This Staff Software Engineer role is based in San Francisco and requires in-person work. Relocation support is available for those moving to the area. Role overview The Core Infrastructure team at Harvey designs and maintains the systems that support every user interaction on the company’s global legal AI platform. These systems process billions of prompt tokens and millions of daily requests for leading law firms and professional service providers around the world. The position combines new infrastructure development with a focus on operational reliability. The work has a direct effect on the platform’s scalability, security, and resilience as Harvey grows into new regions and serves more customers. Key responsibilities Design and implement scalable, fault-tolerant infrastructure systems for Harvey’s AI platform across multiple cloud regions. Own and enhance multi-cloud infrastructure (Azure, GCP), with emphasis on Kubernetes orchestration, networking, and container management. Lead technical initiatives in observability, incident response, and performance tuning.
Join Decagon as a Staff Software Engineer specializing in Machine Learning Infrastructure. In this role, you will play a crucial part in enhancing and optimizing our machine learning systems. You will collaborate with a talented team of engineers to build scalable and efficient infrastructure that supports our AI-driven initiatives.As a key contributor, you will leverage your expertise in software engineering and machine learning to solve complex challenges and drive innovation. Your work will impact various projects and help shape the future of our technology.
Who are we?At Cohere, our mission is to elevate intelligence to benefit humanity. We specialize in training and deploying cutting-edge models for developers and enterprises focused on creating AI systems that deliver extraordinary experiences such as content generation, semantic search, retrieval-augmented generation, and intelligent agents. We view our work as pivotal to the broad acceptance of AI technologies.We are passionate about our creations. Every team member plays a vital role in enhancing our models' capabilities and the value they provide to our customers. We thrive on hard work and speed, always prioritizing our clients' needs.Cohere is a diverse team of researchers, engineers, designers, and more, all dedicated to their craft. Each individual is a leading expert in their field, and we recognize that a variety of perspectives is essential to developing exceptional products.Join us in our mission and help shape the future of AI!Why this role?Are you excited about architecting high-performance, scalable, and reliable machine learning systems? Do you aspire to shape and construct the next generation of AI platforms that enhance advanced NLP applications? We are seeking talented Members of Technical Staff to join our Model Serving team at Cohere. This team is responsible for the development, deployment, and operation of our AI platform, which delivers Cohere's large language models via user-friendly API endpoints. In this role, you will collaborate with multiple teams to deploy optimized NLP models in production settings characterized by low latency, high throughput, and robust availability. Additionally, you will have the opportunity to work directly with customers to create tailored deployments that fulfill their unique requirements.
Full-time|$180K/yr - $250K/yr|On-site|San Francisco
Join our innovative team at fal as a Staff Software Engineer specializing in large-scale computation platforms. We are seeking a seasoned software engineer with extensive experience in developing backend systems that efficiently orchestrate workloads and manage resource constraints. Your expertise in foundational cloud infrastructure and Linux provisioning will be crucial as you work towards achieving high reliability and scalability with minimal operational overhead.
Join the Crew of Ivo!At Ivo, we are more than just engineers; we are the pioneers of the digital seas! Our crew has set sail with groundbreaking innovations that have reshaped the landscape of legal tech:• An AI agent that seamlessly integrates with MS Word to enhance your documents [2023]• Transitioning from traditional embedding models to agentic RAG for superior performance [2023]• Advancing large-scale LLM-driven legal fact extraction [2024]• A legal assistant capable of accurately searching vast contract databases [2024]• Clustering legal documents from the same lineage [2025]• Implementing automatic deviation analysis to uncover hidden risks in extensive contract databases [2025]• Merging contracts with amendments to create comprehensive “composite” contracts (one of our clients shed tears of joy upon seeing this) [2025]The Role of an Infrastructure EngineerAs an Infrastructure Engineer, you will be the architect of Ivo's platform, ensuring its robustness and scalability.Your mission includes:• Taking ownership of our environment's future, with ample room for creative system design.• Managing numerous customer deployments—every client deserves a unique setup, from containers to databases.• Instrumenting our systems to identify performance bottlenecks and errors.• Aggregating metrics, logs, and health checks into user-friendly dashboards and alerts.• Leading the charge during infrastructure incidents.• Accelerating our CI/CD system (currently a sluggish ~12 minutes—let's speed that up!).If you share our passion for LLMs and thrive in a dynamic environment, we want you to help us push the boundaries of DevOps:• Innovating real-time LLM evaluations to ensure the accuracy of our outputs.• Building upon our existing infrastructure to enhance performance and reliability.Set sail with us at Ivo, where your technical skills will help chart the course for the future of legal technology!
Full-time|$325K/yr - $405K/yr|On-site|San Francisco
About Ivo, Inc. Ivo, Inc. is based in San Francisco and builds advanced tools for the legal and document management space. The team has delivered recent projects such as: An AI agent for MS Word that streamlines document editing (2023) Agentic RAG for improved embedding model precision (2023) Large-scale LLMs for legal fact extraction (2024) A legal assistant for searching extensive contract databases with accuracy (2024) Clustering techniques for related legal documents (2025) Automatic deviation analysis to uncover risks in large contract sets (2025) Innovative contract merging to create composite contract series for clients (2025) Role Overview: Infrastructure Staff Software Engineer This role shapes the foundation of Ivo’s platform. The Infrastructure Engineer will design, build, and maintain the systems that power our products and support our engineering team. What You Will Do Design and build scalable infrastructure for Ivo’s platform Manage multiple customer deployments, ensuring each client has dedicated containers, databases, and VPCs Instrument systems to identify and resolve performance bottlenecks and errors Aggregate metrics, logs, and health checks into dashboards and alerting systems Lead response to infrastructure incidents and participate in on-call rotations as needed Optimize CI/CD pipelines to reduce deployment times from approximately 12 minutes DevOps and LLM Innovation Ivo values engineers who are eager to experiment and improve. Areas of exploration include: Building real-time LLM evaluation tools to monitor output accuracy Developing autonomous agents to detect and fix production issues before they escalate Contributing new ideas that advance our mission and platform reliability
Full-time|$240K/yr - $310K/yr|On-site|San Francisco, CA - US
At Crusoe, we are dedicated to accelerating the abundance of energy and intelligence. As a pioneering AI infrastructure company, we control every aspect of our operations — from energy generation to the digital tokens that power the world’s most ambitious AI workloads. Joining Crusoe means being part of a team that is shaping the future at an unprecedented pace.We are amid a transformative industrial revolution. The endless demand for AI computing power poses significant challenges, particularly concerning energy supply. Our energy-first strategy not only enhances AI infrastructure but also contributes positively to the environment, empowering innovators in the AI sector.We seek proactive, problem-solving team members who recognize the scale of our mission and are eager to navigate uncharted territories. If you aspire to advance your career alongside experts in energy, manufacturing, data center construction, and cloud services, we invite you to become part of our dynamic team.If you are ready to engage in the most impactful work of your career, assist our customers and partners in elevating their AI strategies, and contribute to a high-performing, supportive team, we welcome you to build the future with us at Crusoe.About This RoleThe Cloud Storage team at Crusoe is searching for a Senior Staff Software Engineer to act as the principal architect for our storage strategy. Unlike a Staff Engineer who leads feature development, a Senior Staff Engineer will define the long-term technical roadmap essential for our AI-scale infrastructure. You will play a crucial role in establishing the architectural strategy, ensuring the integrity and global scalability of our specialized storage services. Your work will focus on the underlying physics of the stack, bridging high-performance NVMe hardware with globally distributed object storage solutions that compete with S3.Your ResponsibilitiesArchitectural Vision & Strategy: Lead the development and execution of the long-term technical strategy for Crusoe's storage engine, while identifying and integrating industry trends such as CXL and NVMe-oF into a unified roadmap.System Programming Expertise: Utilize your extensive experience in system programming with languages such as C, C++, Go, and Rust to lay the groundwork for our V2 storage re-architecture.Storage Protocols: Design and implement solutions employing industry-standard storage protocols, including NFS, SMB, iSCSI, and NVMe/TCP.
Who We Are:TwelveLabs is at the forefront of developing innovative multimodal foundation models that enable video comprehension akin to human understanding. Our groundbreaking models have set new benchmarks in video-language modeling, enhancing our capabilities and revolutionizing how we engage with and analyze diverse media formats.With an impressive $107 million in Seed and Series A funding, we're supported by premier venture capital firms including NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, alongside influential AI pioneers like Fei-Fei Li, Silvio Savarese, and Alexandr Wang. Our headquarters in San Francisco, complemented by a significant presence in Seoul, highlights our dedication to fostering global innovation.We celebrate the individuality of every team member’s journey, believing that the diverse cultural, educational, and life experiences of our employees fuel our ability to challenge the status quo. We seek passionate individuals who resonate with our mission and are eager to make a significant impact as we advance technology to reshape the world. Join us in redefining video understanding and multimodal AI.About the RoleAs a Senior Staff Infrastructure Engineer at TwelveLabs, you will leverage your technical expertise and leadership skills to construct the systems that drive our multimodal foundation models. Your focus will be on designing and enhancing a scalable, secure, and high-performance infrastructure that accommodates extensive AI workloads across both cloud-based and on-premises environments.This position demands strong technical acumen, an eagerness to delve into low-level systems when necessary, and the capability to influence infrastructure strategy through hands-on contributions and operational improvements. Your impact will be felt through your technical expertise and the results you deliver, rather than through hierarchical status, in a dynamic and fast-paced environment.In this role, you will:Architect and advance cloud and hybrid infrastructure, blending hands-on execution with technical leadership.Guide the development of AI/ML infrastructure components, engaging directly in critical tasks when necessary.Define infrastructure standards and abstractions while maintaining close interaction with production systems.Collaborate closely with Machine Learning Engineers, Data Scientists, Backend Developers, and other key stakeholders to ensure system alignment and efficiency.
Full-time|$160K/yr - $300K/yr|On-site|San Francisco
About ApiphanyApiphany is a trailblazing AI company focused on revolutionizing physical product development. We empower innovators across automotive, aerospace, medtech, and energy sectors to convert vast unstructured technical data into real-time, actionable insights. Supported by elite investors including Markforged, Databricks, GM, and Character, our mission is to transform engineering decision-making, turning complexity into simplicity for leading manufacturers worldwide.Our advanced models are designed to address the intricacies of engineering and manufacturing, comprehending physics principles, design specifications, and program constraints. Our small, elite team consists of builders hailing from prestigious institutions such as Stanford, Berkeley, MIT, UW, and CMU, along with industry veterans from GM, Ford, and Genesis Therapeutics. We are committed to advancing hard-tech and establishing a market-leading company together.About the RoleIn the role of Senior / Staff Infrastructure Engineer at Apiphany, you will architect, build, and manage the infrastructure that underpins our intelligence platform. Your responsibilities will encompass secure, reliable, and scalable cloud deployments, including the unique challenge of deploying across both internal and customer-managed cloud environments.You will ensure our systems adhere to stringent requirements for latency, availability, and compliance within data-intensive environments. Additionally, you will shape our security strategy, implement infrastructure-as-code practices, and establish a solid foundation enabling engineering teams to deliver with assurance.
About Our Team:Join the innovative Database Systems team at OpenAI, where we specialize in high-performance distributed databases. We are the architects behind Rockset, a cutting-edge real-time search, analytics, and vector database that powers all vector search and retrieval augmented generation (RAG) at OpenAI. Rockset underpins core functionalities across all OpenAI product lines and supports various critical internal applications.About the Role:We are in search of engineers who are passionate about distributed systems, performance optimization at a low level (with our core engine developed in C++), and constructing scalable database infrastructures from scratch. As a member of the Database Systems team, you will play a key role in enhancing the core database engine, making significant contributions to ingestion, query execution, indexing, and storage improvements. You will collaborate with multiple teams across OpenAI to unlock new product capabilities and ensure the reliability and scalability of our online database as usage expands exponentially.Your Responsibilities Will Include:Design, develop, and maintain high-performance distributed systems.Identify and address performance bottlenecks to elevate infrastructure capabilities.Define and guide the long-term technical vision and evolution of the system.Collaborate with product, engineering, and research teams to deliver robust and scalable infrastructure.Investigate complex production issues across the entire technology stack.Contribute to incident response, retrospective analyses, and establishing best practices for system reliability.You Will Excel In This Role If You:Possess substantial experience in building, scaling, and optimizing distributed systems.Exhibit a keen interest in database internals, storage engines, or low-latency query systems.Enjoy tackling complex performance challenges in high-throughput systems.Have experience managing and operating production clusters at scale (e.g., Kubernetes or similar orchestration tools).Approach scalability, correctness, and reliability with a rigorous mindset.Thrive in a fast-paced environment where you can make a significant impact.Qualifications:4+ years of relevant industry experience with a focus on distributed systems.Proficiency in C++ or similar low-level programming languages.Strong problem-solving skills and attention to detail.Experience with performance monitoring and optimization tools.Excellent collaboration and communication skills.
Role overview Scale AI seeks a Database Engineer to strengthen and refine its data infrastructure. The position centers on designing, building, and maintaining database systems that deliver high availability and dependable performance. What you will do Design and implement database solutions that align with business requirements Maintain and tune database systems to ensure reliability and speed Collaborate with engineering, product, and operations teams to improve data processing and management Location This role is based in San Francisco, CA or New York, NY.
About UsAt Sierra, we are revolutionizing the way businesses engage with their customers by building a cutting-edge platform that harnesses the power of AI. Our headquarters is located in the vibrant city of San Francisco, with additional offices expanding in Atlanta, New York, London, France, Singapore, and Japan.Our company culture is deeply rooted in our core values: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These principles guide our actions and foster an environment where innovation thrives.Sierra was co-founded by visionary leaders Bret Taylor, who currently serves as the Board Chair of OpenAI and has a rich history with Salesforce and Facebook, and Clay Bavor, who previously led Google Labs and spearheaded initiatives like Google Lens and Project Starline.Your RoleAs a Software Engineer focusing on Infrastructure at Sierra, you will play a pivotal role in designing, constructing, and maintaining the foundational systems that empower our AI platform. Your expertise will ensure that our infrastructure is not only secure and reliable but also scalable, allowing product teams to execute their work with agility and confidence.Guarantee the reliability, scalability, and performance of our platform and LLM inference serving in response to increasing traffic demands.Develop and oversee cloud infrastructure using Terraform to create secure, scalable, and reproducible environments.Establish and manage a self-service infrastructure platform to empower engineering teams in deploying and operating services independently.Take ownership of and improve CI/CD pipelines and release management processes, facilitating rapid and reliable deployments across Sierra’s platform.Design and manage distributed systems utilizing distributed databases, retrieval systems, and machine learning models.Develop and sustain core data serving abstractions along with essential authentication and security features (SSO, RBAC, authentication controls).Effectively navigate and integrate our technology stack with enterprise customer environments in a scalable and maintainable manner.
At Exa, we are on a mission to create a cutting-edge search engine from the ground up, designed to cater to the diverse needs of AI applications. Our team is building a robust infrastructure that enables us to crawl the internet, train advanced embedding models for indexing, and develop high-performance vector databases using Rust. Additionally, we manage a significant $5M H200 GPU cluster that powers tens of thousands of machines.The Infrastructure Team at Exa is responsible for developing the essential tools and infrastructure that support our entire system. We are looking for talented infrastructure engineers to help us scale our capabilities rapidly. Your work could involve orchestrating GPU clusters with Kubernetes, implementing map-reduce batch jobs on Ray, or creating top-tier observability tools that set industry standards.
Sep 3, 2025
Sign in to browse more jobs
Create account — see all 5,995 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.