Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
You will:Lead essential infrastructure projects at Discord. Define and implement a multi-year technical vision and roadmap for Infrastructure, ensuring alignment across teams on architectural principles. Collaborate with leadership to guarantee that our infrastructure capabilities support innovative product development. Design systems that prioritize reliability, cost-effectiveness, and scalability. Mentor and develop senior engineers across various teams, enhancing the overall technical proficiency of the organization. Promote Discord's engineering culture through public engagements such as blog posts, conference presentations, and open-source contributions. You have:A minimum of 10 years of experience in building and managing large-scale distributed systems. Proven experience in architecting complex infrastructures serving hundreds of millions of users with stringent uptime demands. In-depth knowledge of backend systems and databases, along with practical experience in operating systems at scale. Adeptness at navigating significant ambiguity while making sound technical decisions. Demonstrable ability to influence technical direction across multiple teams without direct authority. Strong communication skills, with the ability to convey complex technical concepts to diverse audiences.
About the job
Join Discord, a platform that connects over 200 million users every month primarily through gaming. With over 90% of our users engaged in gaming activities, we facilitate over 1.5 billion hours of gaming conversations, enhancing the experience before, during, and after gameplay.
The Infrastructure organization at Discord is fundamental to our user experience. We handle the real-time delivery of over 40 million events per second and manage the storage of trillions of messages, ensuring robust connections among our vast user base. As a Principal Engineer, you will play a pivotal role in guiding our infrastructure teams, shaping our technical vision, and maintaining the reliability of Discord at a massive scale.
This position is ideal for a professional who excels at the intersection of advanced technical skills and organizational leadership. You will contribute to our infrastructure roadmap, address our most challenging technical dilemmas, and ensure our systems can efficiently scale to accommodate the next wave of users.
About Discord Inc.
Discord is at the forefront of the gaming community, providing a platform where users can connect, communicate, and share experiences. With a focus on enhancing gaming interactions, we strive to make conversations seamless and enjoyable for our users.
Full-time|$400K/yr - $450K/yr|On-site|San Francisco Bay Area
Join Discord, a platform that connects over 200 million users every month primarily through gaming. With over 90% of our users engaged in gaming activities, we facilitate over 1.5 billion hours of gaming conversations, enhancing the experience before, during, and after gameplay.The Infrastructure organization at Discord is fundamental to our user experience. We handle the real-time delivery of over 40 million events per second and manage the storage of trillions of messages, ensuring robust connections among our vast user base. As a Principal Engineer, you will play a pivotal role in guiding our infrastructure teams, shaping our technical vision, and maintaining the reliability of Discord at a massive scale.This position is ideal for a professional who excels at the intersection of advanced technical skills and organizational leadership. You will contribute to our infrastructure roadmap, address our most challenging technical dilemmas, and ensure our systems can efficiently scale to accommodate the next wave of users.
Full-time|$2K/yr - $2K/yr|On-site|San Francisco, CA
Role Overview Nextdata is hiring a Lead Principal Infrastructure Engineer in San Francisco, CA. This position focuses on building the foundation for a decentralized data mesh platform, supporting data ownership and enabling AI, machine learning, and analytics at scale. What You Will Do Develop automation solutions for provisioning and managing the Nextdata OS across multiple cloud platforms. Work closely with the founding engineering team to design and implement a secure, self-service infrastructure for future data product developers. Own the architecture and deployment of the OS, using infrastructure-as-code to ensure high code quality and scalability. Engage directly with customers to understand their requirements and translate feedback into technical improvements. Collaborate with product teams to align infrastructure capabilities with business needs. What You Bring Expertise in large-scale distributed systems and data infrastructure. Experience designing, deploying, and maintaining cloud-based platforms. Strong background in infrastructure-as-code and automation. Ability to work collaboratively with engineering and product teams. Comfort engaging with customers to gather feedback and requirements.
Full-time|$250K/yr - $340K/yr|On-site|San Francisco, CA, USA
The Opportunity Join the Ads Infrastructure team at Unity, where we design and manage the foundational distributed systems that drive one of the largest real-time advertising platforms globally. Our infrastructure is integral to every aspect of Unity Ads, enabling segmentation, optimization, bidding, traffic routing, experimentation, and analytics on a worldwide scale. We are committed to developing resilient, scalable, and cost-effective systems capable of handling immense traffic volumes across multiple regions while adhering to strict latency and availability standards. Utilizing advanced technologies including Kubernetes, Kafka, Flink, Starrocks, Valkey, and other cloud-native components, our platform supports engineers, data scientists, and product teams in advancing Unity Ads. This senior individual contributor role will have a significant technical impact across the organization. You will be responsible for making essential architectural decisions, guiding the long-term evolution of our platform, and collaborating with senior managers and directors to shape the technical vision of Ads Infrastructure while remaining actively involved in hands-on development.
About UsAt Sierra, we are revolutionizing the way businesses engage with their customers by building a cutting-edge platform that harnesses the power of AI. Our headquarters is located in the vibrant city of San Francisco, with additional offices expanding in Atlanta, New York, London, France, Singapore, and Japan.Our company culture is deeply rooted in our core values: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These principles guide our actions and foster an environment where innovation thrives.Sierra was co-founded by visionary leaders Bret Taylor, who currently serves as the Board Chair of OpenAI and has a rich history with Salesforce and Facebook, and Clay Bavor, who previously led Google Labs and spearheaded initiatives like Google Lens and Project Starline.Your RoleAs a Software Engineer focusing on Infrastructure at Sierra, you will play a pivotal role in designing, constructing, and maintaining the foundational systems that empower our AI platform. Your expertise will ensure that our infrastructure is not only secure and reliable but also scalable, allowing product teams to execute their work with agility and confidence.Guarantee the reliability, scalability, and performance of our platform and LLM inference serving in response to increasing traffic demands.Develop and oversee cloud infrastructure using Terraform to create secure, scalable, and reproducible environments.Establish and manage a self-service infrastructure platform to empower engineering teams in deploying and operating services independently.Take ownership of and improve CI/CD pipelines and release management processes, facilitating rapid and reliable deployments across Sierra’s platform.Design and manage distributed systems utilizing distributed databases, retrieval systems, and machine learning models.Develop and sustain core data serving abstractions along with essential authentication and security features (SSO, RBAC, authentication controls).Effectively navigate and integrate our technology stack with enterprise customer environments in a scalable and maintainable manner.
At Exa, we are on a mission to create a cutting-edge search engine from the ground up, designed to cater to the diverse needs of AI applications. Our team is building a robust infrastructure that enables us to crawl the internet, train advanced embedding models for indexing, and develop high-performance vector databases using Rust. Additionally, we manage a significant $5M H200 GPU cluster that powers tens of thousands of machines.The Infrastructure Team at Exa is responsible for developing the essential tools and infrastructure that support our entire system. We are looking for talented infrastructure engineers to help us scale our capabilities rapidly. Your work could involve orchestrating GPU clusters with Kubernetes, implementing map-reduce batch jobs on Ray, or creating top-tier observability tools that set industry standards.
Join Cloudflare as a Principal Software Engineer specializing in Resiliency, where you will play a pivotal role in enhancing our systems' robustness and availability. Your expertise will contribute to building and maintaining resilient infrastructure that supports our global network, ensuring our customers receive uninterrupted service.In this role, you will work alongside a talented team of engineers to identify vulnerabilities, implement solutions, and innovate new strategies that enhance system performance and reliability. If you are passionate about software engineering and system resiliency, we invite you to apply!
Full-time|$245K/yr - $290K/yr|On-site|San Francisco, CA
Redpanda Data is building the Agentic Data Plane (ADP), a platform that connects AI agents with enterprise data and systems. The ADP supports real-time, autonomous reasoning and action by agentic applications, powered by Redpanda's multi-modal data streaming engine. Major organizations across industries, including Activision Blizzard, Cisco, Moody's, Texas Instruments, Vodafone, and two of the top five U.S. banks, rely on Redpanda to process hundreds of terabytes of data every day. Backed by investors such as Lightspeed, GV, and Haystack VC, Redpanda operates as a globally distributed, people-first company. Role overview The Principal Software Engineer will architect and develop the Agentic Data Plane, which serves as the control and execution layer for AI agents interacting with enterprise data. This system enables agents to access, analyze, and act on data in real time, while providing human operators with oversight and control for secure operations. The ADP brings together Redpanda's low-latency streaming technology, a distributed query engine for real-time context, a library of over 300 data connectors, and a global policy and observability framework. This framework enforces access controls, records agent actions, and supports replayable audits. What you will do Design and build the core architecture of the Agentic Data Plane, focusing on secure and efficient data interaction for AI agents. Integrate streaming, query, and policy enforcement components to support real-time, autonomous agent operations. Monitor developments in the agentic AI field and translate research into engineering proposals and product strategies. Work closely with Engineering, Product, and Go-To-Market teams, as well as key customers, to shape the direction of the ADP.
Join Cloudflare as a Principal Software Engineer specializing in billing systems, where you will play a pivotal role in shaping our payment and invoicing solutions. You will collaborate with cross-functional teams to implement innovative solutions that enhance user experiences and streamline processes. If you're passionate about building scalable software and want to contribute to a fast-paced, innovative environment, we want to hear from you!
Who We AreServal is an innovative AI-driven automation platform redefining operational efficiency for enterprises. Our intelligent agents seamlessly comprehend and execute real-world workflows, replacing outdated manual processes with adaptive, self-learning software. Since our inception in early 2024, we have garnered the trust of industry leaders such as General Motors, Notion, Perplexity, Vercel, Mercor, LangChain, and Verkada, streamlining high-volume operational tasks across their organizations.At the heart of Serval is a cutting-edge agentic AI platform that transforms natural language into actionable workflows. Our agents not only respond to queries but also reason, act across various systems, and continuously enhance their performance. What started as a solution for operational tasks has rapidly expanded into a versatile AI automation layer utilized across IT, HR, Finance, Security, Legal, and Engineering sectors.Our mission is to eradicate repetitive, manual tasks within enterprises, empowering teams through intelligent automation. In the long run, we aim to establish a universal AI operations layer—a system of agents that integrates across business functions, maintaining the momentum of modern companies.We are proud to be backed by renowned investors including Sequoia Capital, Redpoint Ventures, Meritech, First Round, General Catalyst, and Elad Gil, and founded by seasoned product and engineering leaders from Verkada.Role OverviewAs a Senior Software Engineer in Infrastructure at Serval, you will be pivotal in developing and scaling the core systems that empower our AI agents and workflow automation platform. A crucial aspect of this role involves enabling and supporting self-hosted deployments for enterprise clients needing on-premises or private cloud environments. We are looking for engineers with profound expertise in distributed systems, infrastructure-as-code, production operations, and customer-facing support, who aspire to influence the technical architecture of a rapidly evolving platform.What You'll DoDesign, implement, and operate large-scale distributed systems that power Serval's AI agents, workflow orchestration, and data pipelines.Create and maintain Terraform modules to provision and manage cloud infrastructure across AWS, GCP, or Azure environments.Develop and sustain deployment packages, installation scripts, and infrastructure templates, enabling customers to self-host Serval in their own environments.Provide technical support and guidance to enterprise customers during installation and deployment phases.
About UsAt Imprint, we are revolutionizing the world of co-branded credit cards and innovative financial solutions, focusing on smarter, more rewarding, and brand-first experiences. We collaborate with renowned brands such as Crate & Barrel, Rakuten, Booking.com, H-E-B, Fetch, and Brooks Brothers to establish modern credit programs that enhance customer loyalty, unlock savings, and stimulate growth. Our robust platform integrates advanced payment technologies, intelligent underwriting, and a seamless user experience, enabling brands to offer impactful financial products without the complexities of becoming a bank.Co-branded credit cards represent over $300 billion in U.S. annual spending, yet many are still managed by outdated banking systems. Imprint stands as the modern alternative—flexible, technology-driven, and tailored for today’s consumers. Supported by notable investors like Kleiner Perkins, Thrive Capital, and Khosla Ventures, we are assembling a world-class team dedicated to reshaping payment methods and driving brand growth. If you thrive in fast-paced environments, enjoy tackling complex challenges, and aspire to make a significant impact, we would be delighted to meet you.Discover more about us on Imprint's Technology Blog.The TeamThe Tech Platform Engineering Team at Imprint is pioneering the democratization of access to advanced technologies, empowering teams across our organization to innovate and excel. Our commitment to redefining the Fintech landscape drives us to build secure, highly available infrastructures while equipping our engineers with comprehensive development tools, allowing them to rapidly create world-class products.Your RoleDesign, build, and manage cloud and web infrastructure with a strong emphasis on security, reliability, and scalability.Implement and maintain infrastructure components across computing, networking, and data platforms.Adhere to security best practices in cloud infrastructure, ensuring proper access control, network isolation, and secure communication between services.Monitor system health and engage in incident response, root cause analysis, and reliability enhancements.Collaborate with platform, security, and product engineers to deliver safe and efficient infrastructure solutions.
About the RoleJoin our pioneering team at vooma as a Backend & Infrastructure Software Engineer, where you will play a critical role in shaping the technical infrastructure of a transformative company.If you are passionate about creating not only resilient systems but also the foundational architecture of a groundbreaking enterprise from the outset, this position is ideal for you.We are looking for someone who excels at crafting infrastructure that is elegant, dependable, and secure, even under high-demand scenarios. You thrive on the challenge of scaling systems that enable intelligent agents and take pride in establishing reliable foundations that others can rely on.Your Key Responsibilities Include:Design and maintain secure, scalable infrastructure tailored for AI-powered agents in production environments.Deploy and optimize AI-driven services to meet high availability and performance standards.Manage infrastructure as code, alongside cloud environments and CI/CD pipelines.Implement monitoring, observability, and alerting systems to ensure the reliability of our infrastructure.Contribute to infrastructure security and adhere to best practices.You Should Have:Experience in deploying and productionizing machine learning or AI-centric workloads.Proficiency in developing secure, scalable infrastructures on platforms such as AWS, Azure, or GCP.In-depth knowledge of backend systems, networking, and container orchestration technologies (e.g., Kubernetes).Understanding of infrastructure security principles and compliance standards (e.g., SOC2).A proactive and hands-on mindset, with a strong drive to solve challenges from start to finish.
Full-time|$300K/yr - $300K/yr|On-site|San Francisco
ABOUT BASETENJoin Baseten, where we drive mission-critical AI inference for leading companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our unique blend of applied AI research, robust infrastructure, and intuitive developer tools empowers organizations at the forefront of AI innovation to deploy state-of-the-art models into production. Recently, we secured a $300M Series E funding round, backed by esteemed investors such as BOND, IVP, Spark Capital, Greylock, and Conviction. Be a part of our rapid growth and help shape the platform that engineers trust for launching AI products.THE ROLEAs an Infrastructure Software Engineer at Baseten, you will play a pivotal role in developing and maintaining our ML inference platform that powers AI applications in production. Your contributions will enhance the core infrastructure, enabling developers to deploy, scale, and monitor machine learning models with exceptional performance.EXAMPLE INITIATIVESYou will engage in innovative projects within our Infrastructure team, including:Multi-cloud capacity managementInference on B200 GPUsMulti-node inferenceFractional H100 GPUs for efficient model servingRESPONSIBILITIESDesign and develop infrastructure components for our ML inference platform, primarily using Python and Go.Implement and maintain Kubernetes deployments for optimal model serving.Contribute to the orchestration layer for model deployments.Build and enhance monitoring systems to track model performance metrics effectively.Develop efficient resource management solutions to optimize performance.
Full-time|$150K/yr - $200K/yr|On-site|San Francisco, CA
At Sift, we are revolutionizing the way cutting-edge machines are constructed, tested, and managed. Our innovative platform provides engineers with real-time visibility into high-frequency telemetry, effectively removing bottlenecks and facilitating quicker, more dependable development.Sift originated from our experience at SpaceX, contributing to projects like Dragon, Falcon, Starlink, and Starship, where the demands of scaling telemetry, debugging flight systems, and ensuring mission reliability necessitated a new kind of infrastructure. Founded by a talented team from SpaceX, Google, and Palantir, Sift is tailored for mission-critical systems where precision and scalability are imperative.As one of the pioneering engineers at Sift, your role will extend beyond just coding—you will play a crucial part in defining the architecture, shaping the product, and influencing the culture of a company dedicated to addressing real engineering challenges. If you're eager to take on intricate technical obstacles and build foundational systems that support complex machines from the ground up, we would love to connect with you.
Principal Software Engineer Saviynt offers an AI-driven identity platform that effectively manages and governs access permissions for both human and non-human entities across all organizational applications, data, and processes. Our clients rely on Saviynt to protect their digital assets, enhance operational efficiency, and lower compliance expenses. Designed for the age of AI, Saviynt is at the forefront of helping organizations safely advance their AI deployments and utilization. As a recognized leader in identity security, we provide solutions that empower and protect some of the world’s leading brands, Fortune 500 companies, and government institutions. For more details, please visit www.saviynt.com. Role Summary In this pivotal role, you will provide technical leadership and extensive knowledge in complex engineering domains, guiding architectural decisions while ensuring scalability, reliability, and quality across key platform components. As a Principal Engineer, you will act as a technical authority, mentor senior engineers, and tackle the most intricate technical challenges. The Connectors team plays a crucial role in facilitating seamless integrations between Saviynt's Identity Governance platform and a multitude of enterprise applications by developing and maintaining robust, scalable connector frameworks. We are committed to ensuring reliable data synchronization, provisioning, and lifecycle management across diverse external systems, forming a vital foundation for the entire platform. What You Will Be Doing ● Design and architect scalable, high-performance connector frameworks for enterprise application integrations.● Define technical standards, best practices, and design patterns for connector development.● Drive architectural decisions for complex integration scenarios involving over 200 enterprise applications.● Evaluate and recommend new technologies, tools, and frameworks to enhance connector reliability and performance.● Lead technical design reviews and provide guidance on system architecture and design trade-offs.
Join Ivo's Engineering Team!At Ivo, we are pioneers in the tech industry. Our engineers are innovators who have created groundbreaking solutions such as:• An AI agent that seamlessly integrates with MS Word to enhance document editing [2023]• Revolutionizing embedding models with agentic RAG technology [2023]• Advanced LLM-based legal fact extraction capabilities [2024]• A legal assistant designed to search extensive contract databases without compromising accuracy [2024]• Clustering legal documents from the same lineage [2025]• Automatic deviation analysis to uncover hidden risks in vast contract databases [2025]• Merging contracts with their amendments to create a “composite” contract timeline that has moved our clients to tears [2025]Role OverviewAs an Infrastructure Engineer at Ivo, you will lay the groundwork for our platform's future. Your responsibilities will include:• Designing and owning the future of our infrastructure, allowing you the freedom to innovate.• Managing multiple customer deployments, ensuring each receives tailored containers, databases, and VPCs.• Instrumenting our systems to identify performance bottlenecks and errors.• Aggregating metrics and logs into visually appealing dashboards and setting up pager alerts.• Leading infrastructure-related incidents and being on-call as necessary.• Enhancing our CI/CD system to reduce deployment time from ~12 minutes.If you're passionate about LLMs, you'll thrive in our engineering team, where you’ll have the opportunity to:• Develop real-time LLM evaluations to monitor the accuracy of our responses.• Collaborate with talented engineers to push the boundaries of DevOps.
Astranis is seeking a talented and motivated Software Engineer to join our Infrastructure team. In this role, you will be at the forefront of developing and maintaining critical software systems that support our innovative satellite technology. You'll collaborate with cross-functional teams to design, implement, and optimize our infrastructure solutions, ensuring high reliability and performance.
Join Cloudflare as a Principal Software Engineer, where you will play a pivotal role in designing and implementing innovative software solutions. You will collaborate with cross-functional teams to enhance our platform's scalability, security, and performance, making a significant impact on our global user base.
About Engineering at Ivo Inc. Ivo Inc. builds advanced legal technology from its San Francisco base. The engineering team has delivered several notable products, including: An AI agent for Microsoft Word that edits documents automatically (2023) Migration from traditional embedding models to agentic RAG methods (2023) Large-scale legal fact extraction powered by LLMs (2024) A legal assistant designed to search large contract databases with precision (2024) Clustering related legal documents to improve organization (2025) Automated deviation analysis to surface hidden risks in contract data (2025) Combining contracts and amendments to create comprehensive contract time series (2025) Role Overview: Infrastructure Software Engineer The Infrastructure Software Engineer will help shape the core systems that power Ivo's platform. This role offers the chance to architect, optimize, and maintain the infrastructure supporting sensitive client data and high-performance legal applications. What You Will Do Own and influence the evolution of Ivo's infrastructure, with significant freedom to design systems due to a lean operational footprint. Orchestrate customer deployments, managing containers, databases, and VPCs for each client to ensure data isolation and security. Implement instrumentation to surface performance bottlenecks and errors across the stack. Aggregate metrics, logs, and health checks into dashboards and alerting systems for clear visibility. Participate in on-call rotations to lead and resolve infrastructure incidents. Optimize CI/CD pipelines to reduce deployment times (current average: 12 minutes). Opportunities to Advance DevOps and LLM Integration Develop real-time LLM evaluations to track output accuracy. Create autonomous agents that identify and troubleshoot production issues proactively. Bring forward new ideas to improve infrastructure and operations. Mission Ivo's mission is to empower clients with advanced legal technology that boosts efficiency and accuracy.
About the TeamOpenAI’s B2B Engineering team is committed to delivering our advanced technology to the world through our developer platform and enterprise products. We develop robust backend systems, APIs, and infrastructure that empower developers and organizations to leverage OpenAI's capabilities in production environments.Our expertise encompasses distributed systems, data infrastructure, platform services, and enterprise-grade features such as security, compliance, authentication, and reliability. We collaborate closely with product, research, design, infrastructure, and forward-deployed teams to transform pioneering AI functionalities into scalable and dependable products.About the RoleWe are seeking a Principal Software Engineer to architect and scale the systems that drive our developer and enterprise-facing products. You will take charge of the design for backend services and platform capabilities that safely and reliably integrate new AI functionalities into production at a global scale.This position covers a wide technical landscape, including distributed systems, APIs, databases, data pipelines, and secure enterprise infrastructure. You will play a pivotal role in shaping both the technical architecture and the product experience of our platform, maintaining high standards for performance, safety, reliability, and API design.ResponsibilitiesDesign, implement, and scale backend services, APIs, and infrastructure supporting OpenAI’s developer and enterprise products.Lead the architectural design of distributed systems, databases, and data pipelines that handle large-scale, high-reliability production workloads.Own key platform capabilities from initial technical strategy and design through implementation, launch, and ongoing operation.Carefully shape API design, treating interfaces as core product touchpoints while ensuring a top-notch developer experience.Create secure, reliable, and compliant systems that cater to both enterprise and developer needs.Work closely with product, research, design, infrastructure, and forward-deployed engineering teams to deploy new capabilities into production.Steer technical direction across complex challenges, making sound architectural trade-offs to balance speed, quality, and maintainability.Enhance engineering productivity by developing internal tools, platform abstractions, and systems that amplify efficiency across the organization.
Join Our Innovative TeamThe Applied Engineering team at OpenAI is dedicated to bridging the gap between research, engineering, product, and design, delivering cutting-edge AI technology to consumers and businesses alike.As a pivotal member of our team, you will manage the core infrastructure that underpins products such as ChatGPT and our API. This includes overseeing our Kubernetes clusters, infrastructure deployment, networking stack, cloud abstractions, and more.Our mission is to learn from our deployments and ensure the responsible and safe use of AI technology. We place a higher priority on safety than on unchecked growth.About Your RoleAs a vital contributor to the cloud infrastructure team, you'll be responsible for constructing and maintaining infrastructure abstractions that facilitate swift and scalable product delivery.This position is based in our San Francisco, CA office.Your Responsibilities:Architect and develop robust development and production platforms that ensure reliability and security at scale.Optimize our infrastructure for scalability to meet future demands.Foster a diverse, equitable, and inclusive work culture that encourages open communication and challenges conventional thinking.Participate in an on-call rotation to maintain the reliability of the systems we build and respond to critical incidents as necessary.You Will Excel in This Position If You:Possess over 5 years of experience in building core infrastructure.Have extensive experience with orchestration systems such as Kubernetes at scale.Are skilled in creating abstractions over cloud platforms.Take pride in developing and managing scalable, reliable, and secure systems.Thrive in environments characterized by ambiguity and rapid change.This role is exclusively located at our San Francisco headquarters. We offer relocation assistance to qualified candidates.
Full-time|$400K/yr - $450K/yr|On-site|San Francisco Bay Area
Join Discord, a platform that connects over 200 million users every month primarily through gaming. With over 90% of our users engaged in gaming activities, we facilitate over 1.5 billion hours of gaming conversations, enhancing the experience before, during, and after gameplay.The Infrastructure organization at Discord is fundamental to our user experience. We handle the real-time delivery of over 40 million events per second and manage the storage of trillions of messages, ensuring robust connections among our vast user base. As a Principal Engineer, you will play a pivotal role in guiding our infrastructure teams, shaping our technical vision, and maintaining the reliability of Discord at a massive scale.This position is ideal for a professional who excels at the intersection of advanced technical skills and organizational leadership. You will contribute to our infrastructure roadmap, address our most challenging technical dilemmas, and ensure our systems can efficiently scale to accommodate the next wave of users.
Full-time|$2K/yr - $2K/yr|On-site|San Francisco, CA
Role Overview Nextdata is hiring a Lead Principal Infrastructure Engineer in San Francisco, CA. This position focuses on building the foundation for a decentralized data mesh platform, supporting data ownership and enabling AI, machine learning, and analytics at scale. What You Will Do Develop automation solutions for provisioning and managing the Nextdata OS across multiple cloud platforms. Work closely with the founding engineering team to design and implement a secure, self-service infrastructure for future data product developers. Own the architecture and deployment of the OS, using infrastructure-as-code to ensure high code quality and scalability. Engage directly with customers to understand their requirements and translate feedback into technical improvements. Collaborate with product teams to align infrastructure capabilities with business needs. What You Bring Expertise in large-scale distributed systems and data infrastructure. Experience designing, deploying, and maintaining cloud-based platforms. Strong background in infrastructure-as-code and automation. Ability to work collaboratively with engineering and product teams. Comfort engaging with customers to gather feedback and requirements.
Full-time|$250K/yr - $340K/yr|On-site|San Francisco, CA, USA
The Opportunity Join the Ads Infrastructure team at Unity, where we design and manage the foundational distributed systems that drive one of the largest real-time advertising platforms globally. Our infrastructure is integral to every aspect of Unity Ads, enabling segmentation, optimization, bidding, traffic routing, experimentation, and analytics on a worldwide scale. We are committed to developing resilient, scalable, and cost-effective systems capable of handling immense traffic volumes across multiple regions while adhering to strict latency and availability standards. Utilizing advanced technologies including Kubernetes, Kafka, Flink, Starrocks, Valkey, and other cloud-native components, our platform supports engineers, data scientists, and product teams in advancing Unity Ads. This senior individual contributor role will have a significant technical impact across the organization. You will be responsible for making essential architectural decisions, guiding the long-term evolution of our platform, and collaborating with senior managers and directors to shape the technical vision of Ads Infrastructure while remaining actively involved in hands-on development.
About UsAt Sierra, we are revolutionizing the way businesses engage with their customers by building a cutting-edge platform that harnesses the power of AI. Our headquarters is located in the vibrant city of San Francisco, with additional offices expanding in Atlanta, New York, London, France, Singapore, and Japan.Our company culture is deeply rooted in our core values: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These principles guide our actions and foster an environment where innovation thrives.Sierra was co-founded by visionary leaders Bret Taylor, who currently serves as the Board Chair of OpenAI and has a rich history with Salesforce and Facebook, and Clay Bavor, who previously led Google Labs and spearheaded initiatives like Google Lens and Project Starline.Your RoleAs a Software Engineer focusing on Infrastructure at Sierra, you will play a pivotal role in designing, constructing, and maintaining the foundational systems that empower our AI platform. Your expertise will ensure that our infrastructure is not only secure and reliable but also scalable, allowing product teams to execute their work with agility and confidence.Guarantee the reliability, scalability, and performance of our platform and LLM inference serving in response to increasing traffic demands.Develop and oversee cloud infrastructure using Terraform to create secure, scalable, and reproducible environments.Establish and manage a self-service infrastructure platform to empower engineering teams in deploying and operating services independently.Take ownership of and improve CI/CD pipelines and release management processes, facilitating rapid and reliable deployments across Sierra’s platform.Design and manage distributed systems utilizing distributed databases, retrieval systems, and machine learning models.Develop and sustain core data serving abstractions along with essential authentication and security features (SSO, RBAC, authentication controls).Effectively navigate and integrate our technology stack with enterprise customer environments in a scalable and maintainable manner.
At Exa, we are on a mission to create a cutting-edge search engine from the ground up, designed to cater to the diverse needs of AI applications. Our team is building a robust infrastructure that enables us to crawl the internet, train advanced embedding models for indexing, and develop high-performance vector databases using Rust. Additionally, we manage a significant $5M H200 GPU cluster that powers tens of thousands of machines.The Infrastructure Team at Exa is responsible for developing the essential tools and infrastructure that support our entire system. We are looking for talented infrastructure engineers to help us scale our capabilities rapidly. Your work could involve orchestrating GPU clusters with Kubernetes, implementing map-reduce batch jobs on Ray, or creating top-tier observability tools that set industry standards.
Join Cloudflare as a Principal Software Engineer specializing in Resiliency, where you will play a pivotal role in enhancing our systems' robustness and availability. Your expertise will contribute to building and maintaining resilient infrastructure that supports our global network, ensuring our customers receive uninterrupted service.In this role, you will work alongside a talented team of engineers to identify vulnerabilities, implement solutions, and innovate new strategies that enhance system performance and reliability. If you are passionate about software engineering and system resiliency, we invite you to apply!
Full-time|$245K/yr - $290K/yr|On-site|San Francisco, CA
Redpanda Data is building the Agentic Data Plane (ADP), a platform that connects AI agents with enterprise data and systems. The ADP supports real-time, autonomous reasoning and action by agentic applications, powered by Redpanda's multi-modal data streaming engine. Major organizations across industries, including Activision Blizzard, Cisco, Moody's, Texas Instruments, Vodafone, and two of the top five U.S. banks, rely on Redpanda to process hundreds of terabytes of data every day. Backed by investors such as Lightspeed, GV, and Haystack VC, Redpanda operates as a globally distributed, people-first company. Role overview The Principal Software Engineer will architect and develop the Agentic Data Plane, which serves as the control and execution layer for AI agents interacting with enterprise data. This system enables agents to access, analyze, and act on data in real time, while providing human operators with oversight and control for secure operations. The ADP brings together Redpanda's low-latency streaming technology, a distributed query engine for real-time context, a library of over 300 data connectors, and a global policy and observability framework. This framework enforces access controls, records agent actions, and supports replayable audits. What you will do Design and build the core architecture of the Agentic Data Plane, focusing on secure and efficient data interaction for AI agents. Integrate streaming, query, and policy enforcement components to support real-time, autonomous agent operations. Monitor developments in the agentic AI field and translate research into engineering proposals and product strategies. Work closely with Engineering, Product, and Go-To-Market teams, as well as key customers, to shape the direction of the ADP.
Join Cloudflare as a Principal Software Engineer specializing in billing systems, where you will play a pivotal role in shaping our payment and invoicing solutions. You will collaborate with cross-functional teams to implement innovative solutions that enhance user experiences and streamline processes. If you're passionate about building scalable software and want to contribute to a fast-paced, innovative environment, we want to hear from you!
Who We AreServal is an innovative AI-driven automation platform redefining operational efficiency for enterprises. Our intelligent agents seamlessly comprehend and execute real-world workflows, replacing outdated manual processes with adaptive, self-learning software. Since our inception in early 2024, we have garnered the trust of industry leaders such as General Motors, Notion, Perplexity, Vercel, Mercor, LangChain, and Verkada, streamlining high-volume operational tasks across their organizations.At the heart of Serval is a cutting-edge agentic AI platform that transforms natural language into actionable workflows. Our agents not only respond to queries but also reason, act across various systems, and continuously enhance their performance. What started as a solution for operational tasks has rapidly expanded into a versatile AI automation layer utilized across IT, HR, Finance, Security, Legal, and Engineering sectors.Our mission is to eradicate repetitive, manual tasks within enterprises, empowering teams through intelligent automation. In the long run, we aim to establish a universal AI operations layer—a system of agents that integrates across business functions, maintaining the momentum of modern companies.We are proud to be backed by renowned investors including Sequoia Capital, Redpoint Ventures, Meritech, First Round, General Catalyst, and Elad Gil, and founded by seasoned product and engineering leaders from Verkada.Role OverviewAs a Senior Software Engineer in Infrastructure at Serval, you will be pivotal in developing and scaling the core systems that empower our AI agents and workflow automation platform. A crucial aspect of this role involves enabling and supporting self-hosted deployments for enterprise clients needing on-premises or private cloud environments. We are looking for engineers with profound expertise in distributed systems, infrastructure-as-code, production operations, and customer-facing support, who aspire to influence the technical architecture of a rapidly evolving platform.What You'll DoDesign, implement, and operate large-scale distributed systems that power Serval's AI agents, workflow orchestration, and data pipelines.Create and maintain Terraform modules to provision and manage cloud infrastructure across AWS, GCP, or Azure environments.Develop and sustain deployment packages, installation scripts, and infrastructure templates, enabling customers to self-host Serval in their own environments.Provide technical support and guidance to enterprise customers during installation and deployment phases.
About UsAt Imprint, we are revolutionizing the world of co-branded credit cards and innovative financial solutions, focusing on smarter, more rewarding, and brand-first experiences. We collaborate with renowned brands such as Crate & Barrel, Rakuten, Booking.com, H-E-B, Fetch, and Brooks Brothers to establish modern credit programs that enhance customer loyalty, unlock savings, and stimulate growth. Our robust platform integrates advanced payment technologies, intelligent underwriting, and a seamless user experience, enabling brands to offer impactful financial products without the complexities of becoming a bank.Co-branded credit cards represent over $300 billion in U.S. annual spending, yet many are still managed by outdated banking systems. Imprint stands as the modern alternative—flexible, technology-driven, and tailored for today’s consumers. Supported by notable investors like Kleiner Perkins, Thrive Capital, and Khosla Ventures, we are assembling a world-class team dedicated to reshaping payment methods and driving brand growth. If you thrive in fast-paced environments, enjoy tackling complex challenges, and aspire to make a significant impact, we would be delighted to meet you.Discover more about us on Imprint's Technology Blog.The TeamThe Tech Platform Engineering Team at Imprint is pioneering the democratization of access to advanced technologies, empowering teams across our organization to innovate and excel. Our commitment to redefining the Fintech landscape drives us to build secure, highly available infrastructures while equipping our engineers with comprehensive development tools, allowing them to rapidly create world-class products.Your RoleDesign, build, and manage cloud and web infrastructure with a strong emphasis on security, reliability, and scalability.Implement and maintain infrastructure components across computing, networking, and data platforms.Adhere to security best practices in cloud infrastructure, ensuring proper access control, network isolation, and secure communication between services.Monitor system health and engage in incident response, root cause analysis, and reliability enhancements.Collaborate with platform, security, and product engineers to deliver safe and efficient infrastructure solutions.
About the RoleJoin our pioneering team at vooma as a Backend & Infrastructure Software Engineer, where you will play a critical role in shaping the technical infrastructure of a transformative company.If you are passionate about creating not only resilient systems but also the foundational architecture of a groundbreaking enterprise from the outset, this position is ideal for you.We are looking for someone who excels at crafting infrastructure that is elegant, dependable, and secure, even under high-demand scenarios. You thrive on the challenge of scaling systems that enable intelligent agents and take pride in establishing reliable foundations that others can rely on.Your Key Responsibilities Include:Design and maintain secure, scalable infrastructure tailored for AI-powered agents in production environments.Deploy and optimize AI-driven services to meet high availability and performance standards.Manage infrastructure as code, alongside cloud environments and CI/CD pipelines.Implement monitoring, observability, and alerting systems to ensure the reliability of our infrastructure.Contribute to infrastructure security and adhere to best practices.You Should Have:Experience in deploying and productionizing machine learning or AI-centric workloads.Proficiency in developing secure, scalable infrastructures on platforms such as AWS, Azure, or GCP.In-depth knowledge of backend systems, networking, and container orchestration technologies (e.g., Kubernetes).Understanding of infrastructure security principles and compliance standards (e.g., SOC2).A proactive and hands-on mindset, with a strong drive to solve challenges from start to finish.
Full-time|$300K/yr - $300K/yr|On-site|San Francisco
ABOUT BASETENJoin Baseten, where we drive mission-critical AI inference for leading companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our unique blend of applied AI research, robust infrastructure, and intuitive developer tools empowers organizations at the forefront of AI innovation to deploy state-of-the-art models into production. Recently, we secured a $300M Series E funding round, backed by esteemed investors such as BOND, IVP, Spark Capital, Greylock, and Conviction. Be a part of our rapid growth and help shape the platform that engineers trust for launching AI products.THE ROLEAs an Infrastructure Software Engineer at Baseten, you will play a pivotal role in developing and maintaining our ML inference platform that powers AI applications in production. Your contributions will enhance the core infrastructure, enabling developers to deploy, scale, and monitor machine learning models with exceptional performance.EXAMPLE INITIATIVESYou will engage in innovative projects within our Infrastructure team, including:Multi-cloud capacity managementInference on B200 GPUsMulti-node inferenceFractional H100 GPUs for efficient model servingRESPONSIBILITIESDesign and develop infrastructure components for our ML inference platform, primarily using Python and Go.Implement and maintain Kubernetes deployments for optimal model serving.Contribute to the orchestration layer for model deployments.Build and enhance monitoring systems to track model performance metrics effectively.Develop efficient resource management solutions to optimize performance.
Full-time|$150K/yr - $200K/yr|On-site|San Francisco, CA
At Sift, we are revolutionizing the way cutting-edge machines are constructed, tested, and managed. Our innovative platform provides engineers with real-time visibility into high-frequency telemetry, effectively removing bottlenecks and facilitating quicker, more dependable development.Sift originated from our experience at SpaceX, contributing to projects like Dragon, Falcon, Starlink, and Starship, where the demands of scaling telemetry, debugging flight systems, and ensuring mission reliability necessitated a new kind of infrastructure. Founded by a talented team from SpaceX, Google, and Palantir, Sift is tailored for mission-critical systems where precision and scalability are imperative.As one of the pioneering engineers at Sift, your role will extend beyond just coding—you will play a crucial part in defining the architecture, shaping the product, and influencing the culture of a company dedicated to addressing real engineering challenges. If you're eager to take on intricate technical obstacles and build foundational systems that support complex machines from the ground up, we would love to connect with you.
Principal Software Engineer Saviynt offers an AI-driven identity platform that effectively manages and governs access permissions for both human and non-human entities across all organizational applications, data, and processes. Our clients rely on Saviynt to protect their digital assets, enhance operational efficiency, and lower compliance expenses. Designed for the age of AI, Saviynt is at the forefront of helping organizations safely advance their AI deployments and utilization. As a recognized leader in identity security, we provide solutions that empower and protect some of the world’s leading brands, Fortune 500 companies, and government institutions. For more details, please visit www.saviynt.com. Role Summary In this pivotal role, you will provide technical leadership and extensive knowledge in complex engineering domains, guiding architectural decisions while ensuring scalability, reliability, and quality across key platform components. As a Principal Engineer, you will act as a technical authority, mentor senior engineers, and tackle the most intricate technical challenges. The Connectors team plays a crucial role in facilitating seamless integrations between Saviynt's Identity Governance platform and a multitude of enterprise applications by developing and maintaining robust, scalable connector frameworks. We are committed to ensuring reliable data synchronization, provisioning, and lifecycle management across diverse external systems, forming a vital foundation for the entire platform. What You Will Be Doing ● Design and architect scalable, high-performance connector frameworks for enterprise application integrations.● Define technical standards, best practices, and design patterns for connector development.● Drive architectural decisions for complex integration scenarios involving over 200 enterprise applications.● Evaluate and recommend new technologies, tools, and frameworks to enhance connector reliability and performance.● Lead technical design reviews and provide guidance on system architecture and design trade-offs.
Join Ivo's Engineering Team!At Ivo, we are pioneers in the tech industry. Our engineers are innovators who have created groundbreaking solutions such as:• An AI agent that seamlessly integrates with MS Word to enhance document editing [2023]• Revolutionizing embedding models with agentic RAG technology [2023]• Advanced LLM-based legal fact extraction capabilities [2024]• A legal assistant designed to search extensive contract databases without compromising accuracy [2024]• Clustering legal documents from the same lineage [2025]• Automatic deviation analysis to uncover hidden risks in vast contract databases [2025]• Merging contracts with their amendments to create a “composite” contract timeline that has moved our clients to tears [2025]Role OverviewAs an Infrastructure Engineer at Ivo, you will lay the groundwork for our platform's future. Your responsibilities will include:• Designing and owning the future of our infrastructure, allowing you the freedom to innovate.• Managing multiple customer deployments, ensuring each receives tailored containers, databases, and VPCs.• Instrumenting our systems to identify performance bottlenecks and errors.• Aggregating metrics and logs into visually appealing dashboards and setting up pager alerts.• Leading infrastructure-related incidents and being on-call as necessary.• Enhancing our CI/CD system to reduce deployment time from ~12 minutes.If you're passionate about LLMs, you'll thrive in our engineering team, where you’ll have the opportunity to:• Develop real-time LLM evaluations to monitor the accuracy of our responses.• Collaborate with talented engineers to push the boundaries of DevOps.
Astranis is seeking a talented and motivated Software Engineer to join our Infrastructure team. In this role, you will be at the forefront of developing and maintaining critical software systems that support our innovative satellite technology. You'll collaborate with cross-functional teams to design, implement, and optimize our infrastructure solutions, ensuring high reliability and performance.
Join Cloudflare as a Principal Software Engineer, where you will play a pivotal role in designing and implementing innovative software solutions. You will collaborate with cross-functional teams to enhance our platform's scalability, security, and performance, making a significant impact on our global user base.
About Engineering at Ivo Inc. Ivo Inc. builds advanced legal technology from its San Francisco base. The engineering team has delivered several notable products, including: An AI agent for Microsoft Word that edits documents automatically (2023) Migration from traditional embedding models to agentic RAG methods (2023) Large-scale legal fact extraction powered by LLMs (2024) A legal assistant designed to search large contract databases with precision (2024) Clustering related legal documents to improve organization (2025) Automated deviation analysis to surface hidden risks in contract data (2025) Combining contracts and amendments to create comprehensive contract time series (2025) Role Overview: Infrastructure Software Engineer The Infrastructure Software Engineer will help shape the core systems that power Ivo's platform. This role offers the chance to architect, optimize, and maintain the infrastructure supporting sensitive client data and high-performance legal applications. What You Will Do Own and influence the evolution of Ivo's infrastructure, with significant freedom to design systems due to a lean operational footprint. Orchestrate customer deployments, managing containers, databases, and VPCs for each client to ensure data isolation and security. Implement instrumentation to surface performance bottlenecks and errors across the stack. Aggregate metrics, logs, and health checks into dashboards and alerting systems for clear visibility. Participate in on-call rotations to lead and resolve infrastructure incidents. Optimize CI/CD pipelines to reduce deployment times (current average: 12 minutes). Opportunities to Advance DevOps and LLM Integration Develop real-time LLM evaluations to track output accuracy. Create autonomous agents that identify and troubleshoot production issues proactively. Bring forward new ideas to improve infrastructure and operations. Mission Ivo's mission is to empower clients with advanced legal technology that boosts efficiency and accuracy.
About the TeamOpenAI’s B2B Engineering team is committed to delivering our advanced technology to the world through our developer platform and enterprise products. We develop robust backend systems, APIs, and infrastructure that empower developers and organizations to leverage OpenAI's capabilities in production environments.Our expertise encompasses distributed systems, data infrastructure, platform services, and enterprise-grade features such as security, compliance, authentication, and reliability. We collaborate closely with product, research, design, infrastructure, and forward-deployed teams to transform pioneering AI functionalities into scalable and dependable products.About the RoleWe are seeking a Principal Software Engineer to architect and scale the systems that drive our developer and enterprise-facing products. You will take charge of the design for backend services and platform capabilities that safely and reliably integrate new AI functionalities into production at a global scale.This position covers a wide technical landscape, including distributed systems, APIs, databases, data pipelines, and secure enterprise infrastructure. You will play a pivotal role in shaping both the technical architecture and the product experience of our platform, maintaining high standards for performance, safety, reliability, and API design.ResponsibilitiesDesign, implement, and scale backend services, APIs, and infrastructure supporting OpenAI’s developer and enterprise products.Lead the architectural design of distributed systems, databases, and data pipelines that handle large-scale, high-reliability production workloads.Own key platform capabilities from initial technical strategy and design through implementation, launch, and ongoing operation.Carefully shape API design, treating interfaces as core product touchpoints while ensuring a top-notch developer experience.Create secure, reliable, and compliant systems that cater to both enterprise and developer needs.Work closely with product, research, design, infrastructure, and forward-deployed engineering teams to deploy new capabilities into production.Steer technical direction across complex challenges, making sound architectural trade-offs to balance speed, quality, and maintainability.Enhance engineering productivity by developing internal tools, platform abstractions, and systems that amplify efficiency across the organization.
Join Our Innovative TeamThe Applied Engineering team at OpenAI is dedicated to bridging the gap between research, engineering, product, and design, delivering cutting-edge AI technology to consumers and businesses alike.As a pivotal member of our team, you will manage the core infrastructure that underpins products such as ChatGPT and our API. This includes overseeing our Kubernetes clusters, infrastructure deployment, networking stack, cloud abstractions, and more.Our mission is to learn from our deployments and ensure the responsible and safe use of AI technology. We place a higher priority on safety than on unchecked growth.About Your RoleAs a vital contributor to the cloud infrastructure team, you'll be responsible for constructing and maintaining infrastructure abstractions that facilitate swift and scalable product delivery.This position is based in our San Francisco, CA office.Your Responsibilities:Architect and develop robust development and production platforms that ensure reliability and security at scale.Optimize our infrastructure for scalability to meet future demands.Foster a diverse, equitable, and inclusive work culture that encourages open communication and challenges conventional thinking.Participate in an on-call rotation to maintain the reliability of the systems we build and respond to critical incidents as necessary.You Will Excel in This Position If You:Possess over 5 years of experience in building core infrastructure.Have extensive experience with orchestration systems such as Kubernetes at scale.Are skilled in creating abstractions over cloud platforms.Take pride in developing and managing scalable, reliable, and secure systems.Thrive in environments characterized by ambiguity and rapid change.This role is exclusively located at our San Francisco headquarters. We offer relocation assistance to qualified candidates.
Aug 4, 2025
Sign in to browse more jobs
Create account — see all 5,821 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.