Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
About YouEssential Skills:A Bachelor's degree or equivalent experience in Computer Science or a related field. Proficiency in Python programming language. You could be a great fit if you:Have experience with large-scale data processing and distributed systems. Possess a background in AI or robotics. Thrive in dynamic, 0-to-1 environments. Are passionate about robotics and driven by a mission to innovate.
About the job
Join xdof, where innovation meets opportunity! As we stand at the forefront of robotics and AI technology, we are dedicated to addressing the critical need for high-quality training data. Our mission is to develop sophisticated data collection systems, operational capabilities, and expansive data warehouses that empower our partners to lead the field.
As an Infrastructure Engineer, you will be instrumental in creating a robust platform that supports our growing data collection initiatives.
Key projects you may work on include:
Developing an orchestration system for processing data upon ingestion.
Designing an internal platform that allows researchers to experiment with our datasets.
Managing a multi-tenant data lake to enhance data accessibility and collaboration.
About xdof
At xdof, we are pioneering the future of robotics by tackling the critical challenges of data collection and processing. Our team is dedicated to building cutting-edge technology that empowers our partners to push the boundaries of what is possible in AI and robotics.
About Our TeamThe Agent Infrastructure team at OpenAI is dedicated to developing advanced systems that facilitate the training and deployment of innovative AI agents, both for internal use and global accessibility. We collaborate closely with researchers to architect and scale environments that empower agentic models, allowing them to execute code, troubleshoot issues, and evolve as software engineers do. Our training environment operates at an unprecedented scale, offering the flexibility to simulate any operational context an agent might encounter.In addition, we are responsible for the core platform that powers the deployment and execution of agents in production. Our infrastructure supports groundbreaking products like Codex, Operator, the tool functionalities in ChatGPT, and future agentic innovations.Our team tackles some of the most intricate challenges in scaling agent capabilities, focusing on the infrastructure layer to ensure that OpenAI can train the most advanced models globally while enhancing the utility of our agentic products for users everywhere.About Your RoleAs a Software Engineer on the Agent Infrastructure Team, you will engage directly with both research and product teams at OpenAI. Your role will involve constructing and scaling systems to train sophisticated agentic models and developing the necessary platforms and integrations to deploy new agents to millions of users worldwide.You will be responsible for building new capabilities—establishing the infrastructure and integrations required to train increasingly complex agentic models—and rapidly scaling these innovations across some of the largest computing clusters globally. Furthermore, you will play a pivotal role in launching agentic products by building, maintaining, and scaling the production platform that supports all agents.We seek individuals with extensive experience in AI infrastructure who excel in collaborating with researchers to create high-performance systems at massive scale for unique applications.This position is available in San Francisco, CA or New York City, NY. We employ a hybrid work model, requiring three days in the office each week, and we offer relocation assistance for new hires.
Join Netic, the cutting-edge AI revenue engine powering essential services that form the backbone of the American economy.Backed by $43 million in funding from top investors like Founders Fund, Greylock, Hanabi, and Dylan Field, we have empowered our clients to secure hundreds of thousands of jobs across various service industries in North America. As a pioneer in AI-driven solutions, we are witnessing the emergence of companies operating entirely on our AI-first platform.As an Agent Infrastructure Engineer, you will be at the forefront of architecting and scaling the core framework that underpins our autonomous AI agents, addressing complex real-world challenges with immediate and significant impacts. Collaborate with a passionate team of innovators from renowned companies such as Scale, Databricks, HRT, Meta, MIT, Stanford, and Harvard, as we bring frontier AI to the physical economy where the stakes are high, and the data is intricate.If you thrive in dynamic, fast-paced environments and are eager to set new benchmarks in the agentic space, seize this opportunity to make your mark!
About the TeamThe Codex Core Agent team is at the forefront of developing the foundational elements of Codex. Our mission is to enhance the agent's capabilities, expedite research efforts, and ensure these advancements are implemented effectively for our users.This involves collaborating across various systems that empower Codex to operate seamlessly in the real world. We focus on optimizing production performance metrics such as token management, latency, reliability, cost efficiency, and capacity. Our work encompasses the core execution loop and interfaces that translate models into actionable behaviors, as well as the shared infrastructure that supports other teams in leveraging Codex. Additionally, we establish feedback mechanisms that refine models and agent behaviors based on real-world usage over time.About the RoleWe are seeking passionate engineers to develop the infrastructure that fuels Codex agents in production environments. This role centers on the systems that ensure models can execute code securely, interact with various tools, complete complex, multi-step tasks, and maintain reliability and efficiency at scale.You will be responsible for designing and managing the infrastructure that supports sandboxed execution, orchestration, stateful workflows, application server and SDK boundaries, as well as model rollouts. Working at the intersection of distributed systems, developer tools, and AI, you will create the core components that enhance Codex's performance, safety, and reliability, making it easier for teams across the organization to build on its capabilities.What You’ll DoDesign and implement execution environments tailored for AI agents, incorporating features like sandboxing, isolation, and reproducibility.Develop orchestration systems for agents that handle multi-step processes and tool utilization.Create infrastructure for the execution, testing, and debugging of code generated by models.Establish state and memory systems that enable agents to maintain context during extended tasks.Optimize production metrics including tokens, latency, reliability, and cost across the Codex deployment.Assist in model rollouts, capacity planning, and managing the essential trade-offs between quality, speed, and cost to effectively handle a fleet of advanced agents at scale.Develop shared platform capabilities that facilitate the work of product teams, partner teams, and the open-source community contributing to Codex.You Might Be a Good Fit If YouPossess substantial experience in distributed systems or infrastructure engineering.Have experience building systems involving containers, sandboxing, or virtualization.Are adept at working across backend systems and collaborating with diverse teams to drive project success.
About UsAt Sierra, we are revolutionizing the way businesses engage with their customers by building a cutting-edge platform that harnesses the power of AI. Our headquarters is located in the vibrant city of San Francisco, with additional offices expanding in Atlanta, New York, London, France, Singapore, and Japan.Our company culture is deeply rooted in our core values: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These principles guide our actions and foster an environment where innovation thrives.Sierra was co-founded by visionary leaders Bret Taylor, who currently serves as the Board Chair of OpenAI and has a rich history with Salesforce and Facebook, and Clay Bavor, who previously led Google Labs and spearheaded initiatives like Google Lens and Project Starline.Your RoleAs a Software Engineer focusing on Infrastructure at Sierra, you will play a pivotal role in designing, constructing, and maintaining the foundational systems that empower our AI platform. Your expertise will ensure that our infrastructure is not only secure and reliable but also scalable, allowing product teams to execute their work with agility and confidence.Guarantee the reliability, scalability, and performance of our platform and LLM inference serving in response to increasing traffic demands.Develop and oversee cloud infrastructure using Terraform to create secure, scalable, and reproducible environments.Establish and manage a self-service infrastructure platform to empower engineering teams in deploying and operating services independently.Take ownership of and improve CI/CD pipelines and release management processes, facilitating rapid and reliable deployments across Sierra’s platform.Design and manage distributed systems utilizing distributed databases, retrieval systems, and machine learning models.Develop and sustain core data serving abstractions along with essential authentication and security features (SSO, RBAC, authentication controls).Effectively navigate and integrate our technology stack with enterprise customer environments in a scalable and maintainable manner.
At Exa, we are on a mission to create a cutting-edge search engine from the ground up, designed to cater to the diverse needs of AI applications. Our team is building a robust infrastructure that enables us to crawl the internet, train advanced embedding models for indexing, and develop high-performance vector databases using Rust. Additionally, we manage a significant $5M H200 GPU cluster that powers tens of thousands of machines.The Infrastructure Team at Exa is responsible for developing the essential tools and infrastructure that support our entire system. We are looking for talented infrastructure engineers to help us scale our capabilities rapidly. Your work could involve orchestrating GPU clusters with Kubernetes, implementing map-reduce batch jobs on Ray, or creating top-tier observability tools that set industry standards.
Who We AreServal is an innovative AI-driven automation platform redefining operational efficiency for enterprises. Our intelligent agents seamlessly comprehend and execute real-world workflows, replacing outdated manual processes with adaptive, self-learning software. Since our inception in early 2024, we have garnered the trust of industry leaders such as General Motors, Notion, Perplexity, Vercel, Mercor, LangChain, and Verkada, streamlining high-volume operational tasks across their organizations.At the heart of Serval is a cutting-edge agentic AI platform that transforms natural language into actionable workflows. Our agents not only respond to queries but also reason, act across various systems, and continuously enhance their performance. What started as a solution for operational tasks has rapidly expanded into a versatile AI automation layer utilized across IT, HR, Finance, Security, Legal, and Engineering sectors.Our mission is to eradicate repetitive, manual tasks within enterprises, empowering teams through intelligent automation. In the long run, we aim to establish a universal AI operations layer—a system of agents that integrates across business functions, maintaining the momentum of modern companies.We are proud to be backed by renowned investors including Sequoia Capital, Redpoint Ventures, Meritech, First Round, General Catalyst, and Elad Gil, and founded by seasoned product and engineering leaders from Verkada.Role OverviewAs a Senior Software Engineer in Infrastructure at Serval, you will be pivotal in developing and scaling the core systems that empower our AI agents and workflow automation platform. A crucial aspect of this role involves enabling and supporting self-hosted deployments for enterprise clients needing on-premises or private cloud environments. We are looking for engineers with profound expertise in distributed systems, infrastructure-as-code, production operations, and customer-facing support, who aspire to influence the technical architecture of a rapidly evolving platform.What You'll DoDesign, implement, and operate large-scale distributed systems that power Serval's AI agents, workflow orchestration, and data pipelines.Create and maintain Terraform modules to provision and manage cloud infrastructure across AWS, GCP, or Azure environments.Develop and sustain deployment packages, installation scripts, and infrastructure templates, enabling customers to self-host Serval in their own environments.Provide technical support and guidance to enterprise customers during installation and deployment phases.
About UsAt Imprint, we are revolutionizing the world of co-branded credit cards and innovative financial solutions, focusing on smarter, more rewarding, and brand-first experiences. We collaborate with renowned brands such as Crate & Barrel, Rakuten, Booking.com, H-E-B, Fetch, and Brooks Brothers to establish modern credit programs that enhance customer loyalty, unlock savings, and stimulate growth. Our robust platform integrates advanced payment technologies, intelligent underwriting, and a seamless user experience, enabling brands to offer impactful financial products without the complexities of becoming a bank.Co-branded credit cards represent over $300 billion in U.S. annual spending, yet many are still managed by outdated banking systems. Imprint stands as the modern alternative—flexible, technology-driven, and tailored for today’s consumers. Supported by notable investors like Kleiner Perkins, Thrive Capital, and Khosla Ventures, we are assembling a world-class team dedicated to reshaping payment methods and driving brand growth. If you thrive in fast-paced environments, enjoy tackling complex challenges, and aspire to make a significant impact, we would be delighted to meet you.Discover more about us on Imprint's Technology Blog.The TeamThe Tech Platform Engineering Team at Imprint is pioneering the democratization of access to advanced technologies, empowering teams across our organization to innovate and excel. Our commitment to redefining the Fintech landscape drives us to build secure, highly available infrastructures while equipping our engineers with comprehensive development tools, allowing them to rapidly create world-class products.Your RoleDesign, build, and manage cloud and web infrastructure with a strong emphasis on security, reliability, and scalability.Implement and maintain infrastructure components across computing, networking, and data platforms.Adhere to security best practices in cloud infrastructure, ensuring proper access control, network isolation, and secure communication between services.Monitor system health and engage in incident response, root cause analysis, and reliability enhancements.Collaborate with platform, security, and product engineers to deliver safe and efficient infrastructure solutions.
ABOUT THE ROLE:As an AI Engineer at Varick, you will take charge of designing and optimizing the intelligence layer within our enterprise operations. This involves creating agent systems that efficiently handle thousands of transactions, make classification decisions, and learn from human interactions.This position is suited for engineers with extensive experience in LLMs, agent architectures, and evaluation systems. You have successfully developed agent workflows that operate in production, not merely in demo environments. Your expertise includes prompt engineering, retrieval, tool calling, multi-agent orchestration, and the evaluation frameworks necessary for deploying trustworthy AI systems in enterprise settings.WHAT YOU'LL DO:• Design and implement agent architectures that tackle complex enterprise workflows, focusing on multi-step reasoning, tool calling, and exception handling.• Develop and sustain evaluation systems that ensure agent quality, accuracy, safety, and groundedness.• Create robust prompt systems, retrieval pipelines, and context engineering strategies to ensure reliable agent performance.• Establish feedback loops that empower agents to learn from human corrections and enhance their functionality over time.• Optimize inference costs and latency for production workloads to ensure efficiency.• Define and uphold best practices for agent reliability, observability, and governance.• Stay updated with the latest models, frameworks, and research to ensure impactful deployments into production.WHAT WE'RE LOOKING FOR:• At least 3 years of software engineering experience, with a minimum of 1–2 years dedicated to LLM applications or AI systems in a production environment.• Practical experience in constructing agent workflows featuring tool calling, retrieval, and multi-step reasoning.• A profound understanding of prompt engineering, context engineering, and methods to elicit reliable behavior from LLMs.• Experience in developing evaluation systems to assess AI output quality.• Proficient in Python with a solid foundation in backend engineering principles.• You have delivered AI features to actual users and have navigated challenges such as hallucinations, edge cases, accuracy drops, and cost management.• Must be based in San Francisco.
About the RoleJoin our pioneering team at vooma as a Backend & Infrastructure Software Engineer, where you will play a critical role in shaping the technical infrastructure of a transformative company.If you are passionate about creating not only resilient systems but also the foundational architecture of a groundbreaking enterprise from the outset, this position is ideal for you.We are looking for someone who excels at crafting infrastructure that is elegant, dependable, and secure, even under high-demand scenarios. You thrive on the challenge of scaling systems that enable intelligent agents and take pride in establishing reliable foundations that others can rely on.Your Key Responsibilities Include:Design and maintain secure, scalable infrastructure tailored for AI-powered agents in production environments.Deploy and optimize AI-driven services to meet high availability and performance standards.Manage infrastructure as code, alongside cloud environments and CI/CD pipelines.Implement monitoring, observability, and alerting systems to ensure the reliability of our infrastructure.Contribute to infrastructure security and adhere to best practices.You Should Have:Experience in deploying and productionizing machine learning or AI-centric workloads.Proficiency in developing secure, scalable infrastructures on platforms such as AWS, Azure, or GCP.In-depth knowledge of backend systems, networking, and container orchestration technologies (e.g., Kubernetes).Understanding of infrastructure security principles and compliance standards (e.g., SOC2).A proactive and hands-on mindset, with a strong drive to solve challenges from start to finish.
Full-time|$300K/yr - $300K/yr|On-site|San Francisco
ABOUT BASETENJoin Baseten, where we drive mission-critical AI inference for leading companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our unique blend of applied AI research, robust infrastructure, and intuitive developer tools empowers organizations at the forefront of AI innovation to deploy state-of-the-art models into production. Recently, we secured a $300M Series E funding round, backed by esteemed investors such as BOND, IVP, Spark Capital, Greylock, and Conviction. Be a part of our rapid growth and help shape the platform that engineers trust for launching AI products.THE ROLEAs an Infrastructure Software Engineer at Baseten, you will play a pivotal role in developing and maintaining our ML inference platform that powers AI applications in production. Your contributions will enhance the core infrastructure, enabling developers to deploy, scale, and monitor machine learning models with exceptional performance.EXAMPLE INITIATIVESYou will engage in innovative projects within our Infrastructure team, including:Multi-cloud capacity managementInference on B200 GPUsMulti-node inferenceFractional H100 GPUs for efficient model servingRESPONSIBILITIESDesign and develop infrastructure components for our ML inference platform, primarily using Python and Go.Implement and maintain Kubernetes deployments for optimal model serving.Contribute to the orchestration layer for model deployments.Build and enhance monitoring systems to track model performance metrics effectively.Develop efficient resource management solutions to optimize performance.
Full-time|$150K/yr - $200K/yr|On-site|San Francisco, CA
At Sift, we are revolutionizing the way cutting-edge machines are constructed, tested, and managed. Our innovative platform provides engineers with real-time visibility into high-frequency telemetry, effectively removing bottlenecks and facilitating quicker, more dependable development.Sift originated from our experience at SpaceX, contributing to projects like Dragon, Falcon, Starlink, and Starship, where the demands of scaling telemetry, debugging flight systems, and ensuring mission reliability necessitated a new kind of infrastructure. Founded by a talented team from SpaceX, Google, and Palantir, Sift is tailored for mission-critical systems where precision and scalability are imperative.As one of the pioneering engineers at Sift, your role will extend beyond just coding—you will play a crucial part in defining the architecture, shaping the product, and influencing the culture of a company dedicated to addressing real engineering challenges. If you're eager to take on intricate technical obstacles and build foundational systems that support complex machines from the ground up, we would love to connect with you.
Join Ivo's Engineering Team!At Ivo, we are pioneers in the tech industry. Our engineers are innovators who have created groundbreaking solutions such as:• An AI agent that seamlessly integrates with MS Word to enhance document editing [2023]• Revolutionizing embedding models with agentic RAG technology [2023]• Advanced LLM-based legal fact extraction capabilities [2024]• A legal assistant designed to search extensive contract databases without compromising accuracy [2024]• Clustering legal documents from the same lineage [2025]• Automatic deviation analysis to uncover hidden risks in vast contract databases [2025]• Merging contracts with their amendments to create a “composite” contract timeline that has moved our clients to tears [2025]Role OverviewAs an Infrastructure Engineer at Ivo, you will lay the groundwork for our platform's future. Your responsibilities will include:• Designing and owning the future of our infrastructure, allowing you the freedom to innovate.• Managing multiple customer deployments, ensuring each receives tailored containers, databases, and VPCs.• Instrumenting our systems to identify performance bottlenecks and errors.• Aggregating metrics and logs into visually appealing dashboards and setting up pager alerts.• Leading infrastructure-related incidents and being on-call as necessary.• Enhancing our CI/CD system to reduce deployment time from ~12 minutes.If you're passionate about LLMs, you'll thrive in our engineering team, where you’ll have the opportunity to:• Develop real-time LLM evaluations to monitor the accuracy of our responses.• Collaborate with talented engineers to push the boundaries of DevOps.
Astranis is seeking a talented and motivated Software Engineer to join our Infrastructure team. In this role, you will be at the forefront of developing and maintaining critical software systems that support our innovative satellite technology. You'll collaborate with cross-functional teams to design, implement, and optimize our infrastructure solutions, ensuring high reliability and performance.
Full-time|$245K/yr - $290K/yr|On-site|San Francisco, CA
Redpanda Data is building the Agentic Data Plane (ADP), a platform that connects AI agents with enterprise data and systems. The ADP supports real-time, autonomous reasoning and action by agentic applications, powered by Redpanda's multi-modal data streaming engine. Major organizations across industries, including Activision Blizzard, Cisco, Moody's, Texas Instruments, Vodafone, and two of the top five U.S. banks, rely on Redpanda to process hundreds of terabytes of data every day. Backed by investors such as Lightspeed, GV, and Haystack VC, Redpanda operates as a globally distributed, people-first company. Role overview The Principal Software Engineer will architect and develop the Agentic Data Plane, which serves as the control and execution layer for AI agents interacting with enterprise data. This system enables agents to access, analyze, and act on data in real time, while providing human operators with oversight and control for secure operations. The ADP brings together Redpanda's low-latency streaming technology, a distributed query engine for real-time context, a library of over 300 data connectors, and a global policy and observability framework. This framework enforces access controls, records agent actions, and supports replayable audits. What you will do Design and build the core architecture of the Agentic Data Plane, focusing on secure and efficient data interaction for AI agents. Integrate streaming, query, and policy enforcement components to support real-time, autonomous agent operations. Monitor developments in the agentic AI field and translate research into engineering proposals and product strategies. Work closely with Engineering, Product, and Go-To-Market teams, as well as key customers, to shape the direction of the ADP.
About Engineering at Ivo Inc. Ivo Inc. builds advanced legal technology from its San Francisco base. The engineering team has delivered several notable products, including: An AI agent for Microsoft Word that edits documents automatically (2023) Migration from traditional embedding models to agentic RAG methods (2023) Large-scale legal fact extraction powered by LLMs (2024) A legal assistant designed to search large contract databases with precision (2024) Clustering related legal documents to improve organization (2025) Automated deviation analysis to surface hidden risks in contract data (2025) Combining contracts and amendments to create comprehensive contract time series (2025) Role Overview: Infrastructure Software Engineer The Infrastructure Software Engineer will help shape the core systems that power Ivo's platform. This role offers the chance to architect, optimize, and maintain the infrastructure supporting sensitive client data and high-performance legal applications. What You Will Do Own and influence the evolution of Ivo's infrastructure, with significant freedom to design systems due to a lean operational footprint. Orchestrate customer deployments, managing containers, databases, and VPCs for each client to ensure data isolation and security. Implement instrumentation to surface performance bottlenecks and errors across the stack. Aggregate metrics, logs, and health checks into dashboards and alerting systems for clear visibility. Participate in on-call rotations to lead and resolve infrastructure incidents. Optimize CI/CD pipelines to reduce deployment times (current average: 12 minutes). Opportunities to Advance DevOps and LLM Integration Develop real-time LLM evaluations to track output accuracy. Create autonomous agents that identify and troubleshoot production issues proactively. Bring forward new ideas to improve infrastructure and operations. Mission Ivo's mission is to empower clients with advanced legal technology that boosts efficiency and accuracy.
Join Our Innovative TeamThe Applied Engineering team at OpenAI is dedicated to bridging the gap between research, engineering, product, and design, delivering cutting-edge AI technology to consumers and businesses alike.As a pivotal member of our team, you will manage the core infrastructure that underpins products such as ChatGPT and our API. This includes overseeing our Kubernetes clusters, infrastructure deployment, networking stack, cloud abstractions, and more.Our mission is to learn from our deployments and ensure the responsible and safe use of AI technology. We place a higher priority on safety than on unchecked growth.About Your RoleAs a vital contributor to the cloud infrastructure team, you'll be responsible for constructing and maintaining infrastructure abstractions that facilitate swift and scalable product delivery.This position is based in our San Francisco, CA office.Your Responsibilities:Architect and develop robust development and production platforms that ensure reliability and security at scale.Optimize our infrastructure for scalability to meet future demands.Foster a diverse, equitable, and inclusive work culture that encourages open communication and challenges conventional thinking.Participate in an on-call rotation to maintain the reliability of the systems we build and respond to critical incidents as necessary.You Will Excel in This Position If You:Possess over 5 years of experience in building core infrastructure.Have extensive experience with orchestration systems such as Kubernetes at scale.Are skilled in creating abstractions over cloud platforms.Take pride in developing and managing scalable, reliable, and secure systems.Thrive in environments characterized by ambiguity and rapid change.This role is exclusively located at our San Francisco headquarters. We offer relocation assistance to qualified candidates.
Sphere develops AI-driven systems that help businesses navigate and comply with global trade laws. The core platform, TRAM, interprets regulations across more than 190 jurisdictions, enabling clients to stay compliant with speed and accuracy. Backed by a16z and Y Combinator, Sphere operates from its San Francisco headquarters. Role overview The AI Agent Infrastructure Lead guides the technical direction of Sphere’s AI agent infrastructure. This position sits within a small, collaborative team where each person’s contributions directly influence company growth. Sphere’s mission centers on automating compliance for global transactions, from calculation to remittance, to improve efficiency and market access. High reliability is essential, systems are engineered for zero downtime. What you will do Lead technical strategy and architecture for AI agent infrastructure Collaborate closely with a tight-knit team at the San Francisco HQ Ensure systems remain highly reliable and available Why Sphere? Tackle complex infrastructure challenges that affect real-world trade Make a direct impact on a product trusted in over 190 jurisdictions Work alongside a small, dedicated team shaping the company’s future Contribute to the evolution of compliance technology in a rapidly growing company
Netic is revolutionizing the essential services sector with our AI-driven revenue engine, empowering the backbone of the American economy.With $43M in funding from leading investors such as Founders Fund, Greylock, Hanabi, and Dylan Field, who spearheaded our Series B, we have enabled our clients to secure hundreds of thousands of jobs across various service industries in North America. Today, numerous companies thrive entirely on an AI-first model powered by Netic.As a member of our team consisting of innovative builders from top organizations such as Scale, Databricks, HRT, Meta, MIT, Stanford, and Harvard, you will be at the forefront of integrating frontier AI into the physical economy, where challenges are complex, data is intricate, and impacts are immediate and substantial.In the role of a founding Product Infrastructure Engineer, you will design and scale the crucial infrastructure that supports our autonomous AI agents, addressing real-world challenges with significant, tangible outcomes. You will work alongside a passionate team of builders to develop infrastructure and processes from scratch, utilizing state-of-the-art cloud and orchestration technologies. If you excel in dynamic, ambiguous settings and are eager to set new benchmarks in the agentic domain, this is your chance to make a lasting impact.
About Chai DiscoveryChai Discovery specializes in developing cutting-edge AI models that revolutionize molecular design and redefine drug discovery processes. Our passionate team is dedicated to transforming the search for new cures and improving lives.Our founding team comprises top researchers and Silicon Valley experts, having achieved significant milestones in AI for biology. With a history of co-inventing protein language modeling and creating advanced folding algorithms, our technology has been embraced by leading pharmaceutical companies. We are proud to be supported by prestigious investors including OpenAI, Thrive Capital, Dimension, Conviction, Lachy Groom, Amplify, and others.About the RoleWe are seeking a dedicated Infrastructure Software Engineer focused on crafting robust, streamlined infrastructure solutions. You will develop the foundational compute and infrastructure systems that support our product offerings, model inference processes, and evaluation frameworks. Collaboration with product engineers, researchers, and our commercial team will be key to your success.You will have experience creating services that developers appreciate, successfully deploying and scaling AI/ML systems in production, and effectively anticipating potential challenges that may hinder the adoption of our platform by leading biopharmaceutical organizations.As Chai's models advance from protein structure prediction into practical therapeutic engineering, this role presents a unique opportunity to bring state-of-the-art AI drug design models to market, working alongside a team that is both detail-oriented and optimistic about the future.About YouYou are motivated by a mission to establish the benchmark for impactful AI technology. We are looking for candidates who possess:Software Experience:A Bachelor’s degree or equivalent experience in Computer Science or a related field.5+ years of experience in building production systems utilizing contemporary tools, collaborating with platform, security, and product teams.A keen ability to foresee infrastructure challenges.Comprehensive ownership of 24/7 infrastructure observability, alerting, and incident response.Experience in both 0-to-1 buildouts and 1-to-n scale-ups, along with a rich repository of best practices and strategies.Communication & Collaboration:A passion for code pair-reviewing, documentation, and knowledge sharing with peers.
About BlockitAt Blockit, we recognize that time is our most precious resource, yet the art of scheduling often feels antiquated. Our mission is to revolutionize this process through advanced AI technology that acts as an autonomous time agent, adeptly managing the complexities of scheduling—including time zones, group coordination, and logistical considerations—as though it were an ever-vigilant executive assistant.As pioneers in the AI space, Blockit is at the forefront of developing one of the first multiplayer, stateful AI agents capable of facilitating interactions among multiple users, maintaining contextual continuity across conversations, and executing real-world actions. The more users integrate their calendars, the more robust our network becomes.Join our dynamic team, supported by Sequoia, where we maintain a fast-paced environment, consistently ship innovative solutions, and uphold high standards of excellence. If you’re excited about building groundbreaking technology, we would love to connect with you.To explore our team culture further, please visit our team page.The RoleIn this role, you will ensure that Blockit remains fast, reliable, and primed for scalability.You will take ownership of our core infrastructure, which includes databases, asynchronous job processing, observability, and the systems that drive our AI agents, including the LLM gateway. You will architect solutions as we expand, whether that means integrating new systems or innovating entirely new approaches. Furthermore, you will be the go-to person for reliability and performance, ensuring our systems remain robust as usage increases.This position is perfect for someone who is passionate about operational excellence and eager to lay the groundwork for a platform that orchestrates millions of calendars.What You’ll DoManage and evolve our core infrastructure, including PostgreSQL, Clickhouse, and asynchronous processing pipelines.Design and optimize our LLM infrastructure, which encompasses the LLM gateway, evaluation pipelines, and observability stack, to guarantee reliability, performance, and cost-effectiveness.Develop comprehensive monitoring, alerting, and dashboard solutions to promptly identify issues.Architect and implement new infrastructure as we scale, such as Redis, Kafka, or similar systems, making informed trade-offs along the way.Enhance deployment pipelines and developer experiences to maintain rapid and safe shipping of updates.
Jan 21, 2026
Sign in to browse more jobs
Create account — see all 5,778 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.