Software Engineer Distributed Systems Infrastructure jobs in San Francisco – Browse 5,904 openings on RoboApply Jobs

Software Engineer Distributed Systems Infrastructure jobs in San Francisco

Open roles matching “Software Engineer Distributed Systems Infrastructure” with location signals for San Francisco. 5,904 active listings on RoboApply Jobs.

5,904 jobs found

1 - 20 of 5,904 Jobs
Apply
companyCloudflare, Inc. logo
Full-time|Hybrid|Hybrid

Join Cloudflare as a Software Engineer specializing in Distributed Systems and Infrastructure. In this role, you will be responsible for designing, implementing, and optimizing scalable systems that enhance the performance and reliability of our services. You will collaborate closely with cross-functional teams to develop innovative solutions that support our mission to help build a better Internet.

Mar 4, 2026
Apply
companyTubi, Inc. logo
Full-time|$227.2K/yr - $417K/yr|Hybrid|San Francisco, CA; Los Angeles, CA; New York, NY (Hybrid); USA - Remote

About the Role:Join our dynamic ML Infrastructure team as a Software Engineer, where you'll collaborate intimately with the Machine Learning and Product teams to construct top-tier machine learning inference platforms. These cutting-edge platforms drive vital services such as personalized recommendations, search functionalities, and content comprehension at Tubi.Your primary focus will be on the development and maintenance of low-latency ML model serving systems that cater to Deep Learning, LLM, and Search models. This will include the creation of self-service infrastructure and critical components such as the inference engine, feature store, vector store, and experimentation engine.In this role, you'll enhance our service deployment and operational processes, with opportunities to contribute to open-source projects. Enjoy architectural freedom to explore innovative frameworks, spearhead significant cross-functional projects, and elevate the capabilities of our ML and Product teams.We are currently hiring for two positions:Staff Software EngineerPrincipal Software EngineerAdditional Details: As a Principal Engineer, you will serve as a technical leader and visionary, guiding the advancement of our machine learning platform. You'll address complex technical challenges, shape architectural decisions, and mentor senior engineers, fostering a culture of excellence and continuous improvement. Your contributions will impact millions of users.

Mar 23, 2026
Apply
companyAchira logo
Full-time|On-site|San Francisco Office

Why Join Achira?Become part of an exceptional team comprised of scientists, ML researchers, and engineers dedicated to transforming the landscape of drug discovery.Engage with cutting-edge machine learning infrastructure at an unprecedented scale, leveraging extensive computing resources, vast datasets, and ambitious goals.Take ownership of significant projects from conception through to architecture and deployment on large-scale infrastructures.Thrive in a culture that values thoroughness, speed, and a proactive, builder-oriented mindset.About the RoleAt Achira, we are developing state-of-the-art foundation models that address the most complex challenges in simulation for drug discovery and beyond. Our atomistic foundation simulation models (FSMs) serve as comprehensive representations of the physical microcosm, encompassing machine learning interaction potentials (MLIPs), neural network potentials (NNPs), and various generative model classes.We are looking for a Software Engineer who is enthusiastic about distributed computing and its applications in machine learning. You will play a pivotal role in designing and constructing the infrastructure for our ML data generation pipelines, model training, and fine-tuning workflows across large-scale distributed systems.Your expertise will be crucial in ensuring our compute clusters are efficient, observable, cost-effective, and dependable, enabling us to advance the frontiers of ML development. If you are passionate about distributed systems, performance optimization, and cloud cost efficiency, we encourage you to apply.You will be empowered to conceptualize and manage complex workloads across multiple vendors worldwide. Achira's mission revolves around computation, and providing seamless access to our uniquely tailored workloads at the lowest possible cost is critical to our success.

Oct 7, 2025
Apply
companyDatabricks logo
Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California

At Databricks, we are driven by a passion for empowering data teams to tackle the world’s most challenging problems — from transforming transportation to accelerating medical innovations. We achieve this by creating and maintaining the leading data and AI infrastructure platform, enabling our clients to leverage profound data insights for business enhancement. Founded by engineers with a customer-first mentality, we eagerly embrace every opportunity to tackle complex technical challenges, ranging from the design of next-generation UI/UX for data interactions to scaling our services across millions of virtual machines. Our journey has just begun.As a member of the Runtime team at Databricks, you will be instrumental in developing the next generation of distributed data storage and processing systems. These systems will surpass specialized SQL query engines in relational query performance while offering the programming abstractions necessary to support a variety of workloads, from ETL to data science.Example projects include:Apache Spark™: Contribute to the de facto open-source standard framework for big data.Data Plane Storage: Develop reliable and high-performance services and client libraries for managing vast amounts of data within cloud storage backends like AWS S3 and Azure Blob Store.Delta Lake: Design a storage management system that merges the scalability and cost-effectiveness of data lakes with the performance and reliability of data warehouses, providing features like ACID transactions and time travel.Delta Pipelines: Simplify the orchestration and operation of numerous data pipelines, enabling clients to deploy, test, and upgrade pipelines effortlessly.Performance Engineering: Create the next-generation query optimizer and execution engine that is fast, scalable, and robust.

Jan 30, 2026
Apply
companyAmbience Healthcare logo
Full-time|$250K/yr - $300K/yr|Hybrid|San Francisco

About Us:At Ambience Healthcare, we are not just another scribe; we are pioneering an AI intelligence platform that reintegrates humanity into healthcare, delivering significant ROI for health systems nationwide.Our innovative technology empowers providers to concentrate on delivering exceptional care by alleviating the administrative burdens that distract them from their patients and essential duties. Ambience offers real-time, coding-aware documentation and clinical workflow support across various healthcare settings at the leading health systems in North America.Our teams operate with unwavering dedication and extreme ownership to develop optimal solutions for our healthcare partners. We cherish transparency, positivity, and deep contemplation, holding each other to high standards because we recognize that the challenges we tackle are of utmost importance.Recognized as the leader in enhancing clinician experience by KLAS Research in their Emerging Solutions Top 20 Report, honored by Fast Company as one of the Next Big Things in Tech, acknowledged by Inc. as one of the best AI companies in healthcare, and selected as a LinkedIn Top Startup in 2024 and 2025. We're proudly supported by Oak HC/FT, Andreessen Horowitz (a16z), OpenAI Startup Fund, and Kleiner Perkins — and we're just beginning our journey.The Role:Ambience is responsible for processing millions of patient encounters across the largest health systems in the country. These organizations rely on us for real-time clinical workflows where latency and reliability significantly influence patient care. A delay during a patient visit is not merely a negative metric; it can lead to a physician abandoning the tool.In this position, you will oversee the core systems that enable Ambience to scale with reliability: database architecture, caching, multi-tenancy, and performance optimization that influences the user experience for clinicians. You will design database architectures that accommodate our growth, construct caching systems that prevent EHR API latency from affecting critical processes, and develop multi-tenant infrastructures that protect customer data while enhancing performance.Your ultimate goal will be to create infrastructure that other teams rely on effortlessly.Our engineering roles are hybrid, requiring presence in our San Francisco office three times a week.

Feb 2, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About the TeamAt OpenAI, we are on a mission to develop safe and beneficial artificial general intelligence. Our models are integrated into innovative products such as ChatGPT and various APIs. To ensure these systems are swift, reliable, and economically viable, we require top-tier infrastructure that stands out in the industry.The Caching Infrastructure team plays a pivotal role by creating a robust caching layer that supports numerous critical applications at OpenAI. Our goal is to deliver a high-availability, multi-tenant caching platform capable of auto-scaling with workload demands, reducing tail latency, and accommodating a wide array of use cases.We seek an experienced engineer who can design and scale this essential infrastructure. The ideal candidate will possess extensive experience in distributed caching systems (e.g., Redis, Memcached), a solid understanding of networking fundamentals, and expertise in Kubernetes-based service orchestration.

Jul 18, 2025
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

Team and Platform Focus The Compute Infrastructure team at OpenAI designs, builds, and maintains the systems that support AI research at scale. This work brings together accelerators, CPUs, networking, storage, data centers, orchestration software, agent infrastructure, developer tools, and observability. The aim is to create a reliable, unified experience for researchers and product teams across the company. Projects span the full stack: capacity planning, cluster lifecycle management, bare-metal automation, and distributed systems. The team manages Kubernetes scheduling, system optimization, high-performance networking, storage, fleet health, reliability, workload profiling, benchmarking, and improvements to the developer experience. Even small improvements in communication, scheduling, hardware efficiency, or debugging can significantly accelerate research. OpenAI matches engineers to areas within Compute Infrastructure that align with their skills and interests. Role Overview This Software Engineer role centers on building and evolving the compute platform that supports OpenAI’s research and products. Candidates may bring expertise in low-level systems, high-performance computing, distributed infrastructure, reliability, CaaS, agent infrastructure, developer platforms, tooling, or infrastructure user experience. The most important qualities are strong analytical skills, the ability to write resilient code, and a collaborative approach that helps colleagues move faster and with more confidence. What You Will Work On Working close to hardware or at the user interaction layer Developing CaaS and agent infrastructure Managing control and data planes that connect the system Bringing new supercomputing capabilities online Optimizing training workloads through profiler traces and benchmarks Improving NCCL and collective communication Analyzing GPUs, NICs, topology, firmware, thermal dynamics, and failure modes Designing abstractions to unify diverse clusters into a single platform Areas of Expertise No one is expected to cover every area listed. Some engineers focus on system performance, kernel or runtime behavior, large-scale networking protocols, RDMA, NCCL, GPU hardware, benchmarking, scheduling, or hardware reliability. Others improve the platform’s usability through APIs, tools, workflows, and developer experience. The team values strong engineering judgment and a drive to advance the field.

Apr 27, 2026
Apply
companyBrowserbase logo
Full-time|On-site|San Francisco

At Browserbase, we revolutionize web browsing for AI agents and applications. Our innovative headless browser infrastructure automates interactions with websites, simplifies form filling, and replicates user actions seamlessly.Having successfully raised a $40M Series B last year, we are on an accelerated growth trajectory. Supported by esteemed investors such as Kleiner Perkins, CRV, and Notable Capital, our dynamic team is committed to realizing our CEO's vision for empowering the best AI tools and transforming web automation.Our Core Infrastructure team is essential for maintaining the efficiency of our operations. This group tackles significant distributed systems challenges, ensuring our platform's speed, reliability, and scalability.

Mar 25, 2025
Apply
companyKrea logo
Full-time|On-site|San Francisco

Join Krea's Innovative TeamAt Krea, we are at the forefront of developing next-generation AI creative tools. Our commitment lies in making AI an intuitive and controllable medium for creatives. We aspire to create tools that enhance human creativity rather than replace it.We view AI as a transformative medium that enables expressions across diverse formats—text, images, video, sound, and even 3D. Our focus is on creating smarter, more adaptable tools that leverage this medium effectively.The Role of Supercomputing and AI Infrastructure at KreaOur team is responsible for building and managing the foundational infrastructure that supports Krea's research and inference processes. This includes distributed training systems, over 1000 Kubernetes GPU clusters, and extensive petabyte-scale data pipelines. Much of our work involves creating bespoke solutions, such as custom distributed datastores, job orchestration systems, and advanced streaming pipelines, which are designed to handle modern AI workloads efficiently.Key Projects You Will Contribute To:Distributed Data Systems: Design and implement multi-stage pipelines to transform petabytes of raw data into clean, annotated datasets; run classification models across billions of images; deploy and integrate large language models to caption extensive multimedia data.GPU Infrastructure: Manage distributed training and inference across 1000+ GPU Kubernetes clusters; address orchestration and scaling challenges for large-scale GPU job processing; optimize research workflows across multiple datacenters.Distributed Training: Profile and enhance dataloaders streaming thousands of images per second; troubleshoot InfiniBand networking during extensive training runs; develop fault tolerance systems for large-scale pretraining; collaborate with researchers to refine reinforcement learning infrastructure.Applied ML Pipelines: Identify clean scenes in millions of videos utilizing distributed shot-boundary detection; tailor and train models to sift through billions of images for specific queries; construct systems that link raw cluster capacity with research outcomes.

Apr 3, 2026
Apply
companyBaseten logo
Full-time|On-site|San Francisco

Join Baseten as a Software Engineer focusing on GPU Networking and Distributed Systems. In this pivotal role, you'll collaborate with talented engineers and researchers to develop cutting-edge solutions that leverage GPU technology for high-performance networking operations. Your contributions will be instrumental in shaping the future of distributed systems, enhancing performance, scalability, and reliability.

Feb 23, 2026
Apply
companyOpenAI logo
Full-time|Hybrid|San Francisco

About Our TeamJoin the innovative Sora team at OpenAI, where we are at the forefront of developing multimodal capabilities for our foundation models. As a dynamic hybrid of research and product development, we focus on seamlessly integrating advanced multimodal functionalities into our AI offerings, ensuring they are not only reliable and user-friendly but also aligned with our mission to foster broad societal benefits.About the PositionWe are seeking a dedicated Software Engineer specializing in Distributed Data Systems to architect and enhance the infrastructure that supports large-scale multimodal training and evaluation at OpenAI. In this role, you will oversee distributed data pipelines and collaborate closely with our researchers to translate their requirements into robust, high-performance systems. You will play a crucial role in fortifying the pipelines that underpin Sora’s rapid innovation cycles.We are looking for engineers with a keen eye for detail, substantial experience with distributed systems, and a proven track record of building reliable infrastructures in high-stakes environments.This position is based in San Francisco, CA, and follows a hybrid work model requiring three days in the office each week. We also provide relocation assistance to new team members.Key Responsibilities:Design, build, and maintain data infrastructure systems including distributed computing, data orchestration, distributed storage, streaming infrastructure, and machine learning infrastructure, ensuring they are scalable, reliable, and secure.Ensure our data platform can scale dramatically while maintaining high levels of reliability and efficiency.Collaborate with researchers to deeply understand their needs and translate them into production-ready systems.Harden, optimize, and maintain vital data infrastructure systems that drive multimodal training and evaluation.Ideal Candidates Will Have:Extensive experience with distributed systems and large-scale infrastructure, coupled with a strong passion for data.A detail-oriented mindset and a commitment to building and maintaining dependable systems.Solid software engineering fundamentals and exceptional organizational skills.Comfort with ambiguity and rapid changes in a fast-paced environment.About OpenAIOpenAI is a pioneering AI research and deployment organization dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We strive to advance digital intelligence in a way that is safe and beneficial, pushing the boundaries of innovation and technology.

Nov 14, 2025
Apply
companyScale AI logo
Full-time|$216.2K/yr - $270.3K/yr|On-site|San Francisco, CA; New York, NY

Join Scale AI's innovative team as an Infrastructure Software Engineer for our Enterprise Generative AI Platform (SGP). In this dynamic role, you will help design and enhance our enterprise-grade AI platform, which offers robust APIs for knowledge retrieval, inference, evaluation, and more. We're seeking an exceptional engineer who thrives in fast-paced environments and is eager to contribute to the scaling of our core infrastructure. The ideal candidate will possess a solid foundation in software engineering principles and extensive experience with large-scale distributed systems. Your role will involve implementing solutions across various cloud providers (GCP, Azure, AWS) for clients in highly regulated sectors, including healthcare, telecommunications, finance, and retail.

Mar 26, 2026
Apply
companyScribd, Inc. logo
Full-time|$120K/yr - $228K/yr|Hybrid|San Francisco

At Scribd, Inc., we are dedicated to enhancing human understanding through our suite of innovative products, including Scribd®, Slideshare®, Everand™, and Fable. Our mission revolves around transforming access into deeper insights and expertise for billions globally.Our CultureWe foster a culture where authenticity and boldness are encouraged; where constructive debates lead to commitment, and where every team member is empowered to prioritize customer needs.We believe that exceptional work emerges from harmonizing individual flexibility with a strong sense of community. Our Scribd Flex program allows employees to select their preferred work style and location, while also emphasizing the importance of intentional in-person interactions to enhance collaboration and culture. All employees are expected to participate in occasional in-person meetings, regardless of their location.We look for team members who embody “GRIT”—the intersection of passion and perseverance towards long-term goals. GRIT serves as a framework for our operations: setting and achieving Goals, delivering impactful Results, contributing Innovative ideas, and building a strong Team through collaboration.Join us at Scribd (pronounced “scribbed”) as we ignite human curiosity and create a world filled with stories and knowledge, democratizing the exchange of ideas and empowering collective expertise.The TeamOur ML Data Engineering team is responsible for powering metadata extraction, enrichment, and content understanding across our platforms.

Nov 17, 2025
Apply
companyDatabricks logo
Full-time|$192K/yr - $260K/yr|On-site|San Francisco, California

P-186 At Databricks, we are passionate about empowering data teams to tackle some of the world’s most challenging problems, from security threat detection to cancer drug development. Our mission is to build and operate the leading data and AI infrastructure platform, enabling our customers to concentrate on the high-value challenges that are integral to their own objectives. Founded in 2013 by the original creators of Apache Spark™, Databricks has rapidly evolved from a small office in Berkeley, California, to a global powerhouse with over 1000 employees. Trusted by thousands of organizations, from startups to Fortune 100 companies, we are recognized as one of the fastest-growing SaaS companies worldwide. Our engineering teams create highly sophisticated products that address significant needs in the industry. We continuously push the limits of data and AI technology while maintaining the resilience, security, and scalability essential for our customers' success on our platform. We manage one of the largest-scale software platforms, consisting of millions of virtual machines that generate terabytes of logs and process exabytes of data daily. At this scale, we frequently encounter cloud hardware, network, and operating system faults, and our software must effectively shield our customers from these challenges. Modern data analysis leverages advanced techniques, such as machine learning, that far exceed the capabilities of traditional SQL query engines. As a Software Engineer on the Runtime team at Databricks, you will be instrumental in developing the next generation of distributed data storage and processing systems that outshine specialized SQL query engines in relational query performance, while providing the flexibility and programming abstractions to support a variety of workloads, from ETL to data science. Examples of projects you may work on include: Apache Spark™: Contributing to the de facto open-source framework for big data. Data Plane Storage: Developing reliable, high-performance services and client libraries for storing and accessing vast amounts of data on cloud storage backends like AWS S3 and Azure Blob Store. Delta Lake: A storage management system that merges the scalability and cost-effectiveness of data lakes with the performance and reliability of data warehouses, featuring low latency streaming. Its higher-level abstractions and guarantees, including ACID transactions and time travel, significantly reduce the complexity of real-world data engineering architectures. Delta Pipelines: Aiming to simplify the management of data engineering pipelines.

Jan 30, 2026
Apply
companyInngest logo
Full-time|Remote|US/Remote

At Inngest, our Systems Engineers are the architects behind the backbone of our platform, crafting a robust execution layer, an efficient queueing system, and scalable state stores that interlink seamlessly. This role presents an exciting opportunity to tackle complex technical challenges while deriving immense satisfaction from your contributions.About Us: Inngest is pioneering innovative solutions to long-standing challenges faced by developers. Our mission is to create first-of-its-kind tools that enhance the daily workflow of developers, prioritizing user experience and performance. A strong product-centric mindset and a passion for developer tools are essential for success in this role.The Role: A successful Systems Engineer at Inngest must possess a blend of generalist and specialist skills. You will collaborate with our team to enhance the functionality of our queueing system (including debounce and concurrency mechanisms), manage vast amounts of data in our state store, and refine the API layers that facilitate user interactions. Your work will have a direct impact on millions of developers, and you will engage closely with designers, engineers, and founders to optimize user experience.Note: This position requires overlapping working hours with US PST. While residing in the San Francisco Bay Area is preferred, exceptional candidates from anywhere in the United States will be considered. Our engineering team operates in-person several days a week in San Francisco.

Apr 15, 2025
Apply
companyCohere logo
Full-Time|On-site|San Francisco

Who are we?At Cohere, our mission is to elevate intelligence to benefit humanity. We specialize in training and deploying cutting-edge models for developers and enterprises focused on creating AI systems that deliver extraordinary experiences such as content generation, semantic search, retrieval-augmented generation, and intelligent agents. We view our work as pivotal to the broad acceptance of AI technologies.We are passionate about our creations. Every team member plays a vital role in enhancing our models' capabilities and the value they provide to our customers. We thrive on hard work and speed, always prioritizing our clients' needs.Cohere is a diverse team of researchers, engineers, designers, and more, all dedicated to their craft. Each individual is a leading expert in their field, and we recognize that a variety of perspectives is essential to developing exceptional products.Join us in our mission and help shape the future of AI!Why this role?Are you excited about architecting high-performance, scalable, and reliable machine learning systems? Do you aspire to shape and construct the next generation of AI platforms that enhance advanced NLP applications? We are seeking talented Members of Technical Staff to join our Model Serving team at Cohere. This team is responsible for the development, deployment, and operation of our AI platform, which delivers Cohere's large language models via user-friendly API endpoints. In this role, you will collaborate with multiple teams to deploy optimized NLP models in production settings characterized by low latency, high throughput, and robust availability. Additionally, you will have the opportunity to work directly with customers to create tailored deployments that fulfill their unique requirements.

Jan 12, 2026
Apply
companyScribd Inc. logo
Full-time|$120K/yr - $228K/yr|On-site|San Francisco

About Scribd:At Scribd Inc. (pronounced “scribbed”), we ignite human curiosity by fostering a world rich in stories and knowledge. Join our innovative team as we democratize the flow of ideas and information, empowering collective expertise through our diverse product offerings: Everand, Scribd, Slideshare, and Fable.This job posting represents an open and approved position within our organization.We cultivate a culture where authenticity and boldness thrive; where we engage in vibrant discussions and embrace the unexpected; and where every employee is empowered to take meaningful actions with a firm customer focus.Our work structure emphasizes a balance between individual flexibility and community engagement. Through our Scribd Flex program, employees can collaborate with their managers to determine the most effective work style that meets their personal needs. A core principle of Scribd Flex is prioritizing intentional in-person interactions that foster collaboration, culture, and connection. Therefore, occasional in-person attendance is required for all Scribd employees, irrespective of their location.What do we seek in new team members? We hire for “GRIT.” GRIT embodies the blend of passion and perseverance towards long-term goals. At Scribd Inc., we believe in the possibilities this can unlock and encourage our employees to adopt a GRIT-driven approach to their work. In practical terms, GRIT represents our standards: we look for individuals who can set and accomplish Goals, deliver Results in their roles, contribute Innovative ideas and solutions, and positively impact the Team through collaboration and attitude.

Nov 17, 2025
Apply
companyAnthropic logo
Full-time|$405K/yr - $485K/yr|On-site|San Francisco, CA | New York City, NY | Seattle, WA

About AnthropicAt Anthropic, our mission is to develop AI systems that are reliable, interpretable, and steerable. We are committed to ensuring that AI technology is safe and beneficial for our users and society at large. Our rapidly expanding team comprises dedicated researchers, engineers, policy experts, and business leaders, all working collaboratively to create beneficial AI solutions.About the RoleThe Infrastructure organization at Anthropic plays a critical role in our mission to create reliable AI systems. The systems we develop are essential for accelerating the training of new models, conducting safety experiments effectively, and scaling our AI technology, Claude, to serve millions of users. We strive to demonstrate that robust infrastructure and cutting-edge capabilities can work together harmoniously.The Systems engineering team is responsible for ensuring compute uptime and resilience at scale, building the clusters, automation, and observability that enable safe and effective frontier AI research and deployment.Team Matching: After the interview process, team assignments are based on interview performance, individual interests, and business needs. Candidates may be considered for various Infrastructure teams.

Feb 13, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamAt OpenAI, our Storage Infrastructure team is at the forefront of enabling data accessibility, placement, and lifecycle management through advanced APIs. We prioritize scalability, reliability, security, and usability to meet the demands of our pioneering AI research.Role OverviewWe are seeking a talented Software Engineer to join our Storage Infrastructure team, where you will architect and maintain Exascale systems designed to efficiently and reliably manage research data across multiple regions.The ideal candidate will have extensive experience in distributed systems, particularly in developing exascale data management solutions or distributed filesystems.Your ResponsibilitiesDesign and develop software solutions to manage exascale data, ensuring accessibility for researchers.Enhance the reliability, predictability, and cost efficiency of our storage systems.Collaborate with researchers to understand and address diverse data use cases.Implement robust security measures to protect our critical datasets.Ideal Candidate ProfileStrong foundation in distributed systems principles with a proven ability to design and implement scalable, reliable, and secure storage architectures.Proficiency in programming languages relevant to storage systems development.Experience with cloud platforms, particularly Azure.Familiarity with AI/ML data access patterns.A proactive approach and adaptability in a fast-paced, dynamic environment.About OpenAIOpenAI is a cutting-edge AI research and deployment organization committed to ensuring that general-purpose artificial intelligence benefits all of humanity. We strive to push the boundaries of AI capabilities while ensuring safety and human-centric development. Our mission is to encompass and appreciate diverse perspectives, voices, and experiences that reflect the full spectrum of humanity.We are proud to be an equal opportunity employer, committed to fostering an inclusive workplace where all individuals are respected and valued.

Dec 10, 2024
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our Team:Join the innovative Database Systems team at OpenAI, where we specialize in high-performance distributed databases. We are the architects behind Rockset, a cutting-edge real-time search, analytics, and vector database that powers all vector search and retrieval augmented generation (RAG) at OpenAI. Rockset underpins core functionalities across all OpenAI product lines and supports various critical internal applications.About the Role:We are in search of engineers who are passionate about distributed systems, performance optimization at a low level (with our core engine developed in C++), and constructing scalable database infrastructures from scratch. As a member of the Database Systems team, you will play a key role in enhancing the core database engine, making significant contributions to ingestion, query execution, indexing, and storage improvements. You will collaborate with multiple teams across OpenAI to unlock new product capabilities and ensure the reliability and scalability of our online database as usage expands exponentially.Your Responsibilities Will Include:Design, develop, and maintain high-performance distributed systems.Identify and address performance bottlenecks to elevate infrastructure capabilities.Define and guide the long-term technical vision and evolution of the system.Collaborate with product, engineering, and research teams to deliver robust and scalable infrastructure.Investigate complex production issues across the entire technology stack.Contribute to incident response, retrospective analyses, and establishing best practices for system reliability.You Will Excel In This Role If You:Possess substantial experience in building, scaling, and optimizing distributed systems.Exhibit a keen interest in database internals, storage engines, or low-latency query systems.Enjoy tackling complex performance challenges in high-throughput systems.Have experience managing and operating production clusters at scale (e.g., Kubernetes or similar orchestration tools).Approach scalability, correctness, and reliability with a rigorous mindset.Thrive in a fast-paced environment where you can make a significant impact.Qualifications:4+ years of relevant industry experience with a focus on distributed systems.Proficiency in C++ or similar low-level programming languages.Strong problem-solving skills and attention to detail.Experience with performance monitoring and optimization tools.Excellent collaboration and communication skills.

Jul 29, 2025

Sign in to browse more jobs

Create account — see all 5,904 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.