Lead Technical Staff Member Inference Infrastructure jobs in San Francisco – Browse 2,771 openings on RoboApply Jobs

Technical Staff Member - Inference Engineering

InferactSan Francisco

Remote Full-time $200K/yr - $400K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Required QualificationsBachelor's degree or equivalent experience in computer science, engineering, or a related field. In-depth understanding of transformer architectures and their derivatives. Proficient programming skills in Python, with a strong background in PyTorch internals. Experience with LLM inference systems (e.g., vLLM, TensorRT-LLM, SGLang, TGI). Ability to interpret and implement model architectures and inference techniques as presented in academic papers. Proven capability to produce high-performance, maintainable code and troubleshoot complex machine learning codebases. Preferred QualificationsComprehensive knowledge of KV-cache memory management, prefix caching, and hybrid model serving. Familiarity with reinforcement learning frameworks and algorithms for large language models. Experience in multimodal inference across various media types (audio, image, video, text). Previous contributions to open-source machine learning or systems infrastructure projects. Additionally, bonus points if you have:Successfully implemented core features in vLLM or other inference engine projects. Contributed to vLLM integrations (e.g., verl, OpenRLHF, Unsloth, LlamaFactory). Authored widely-shared technical blogs or side projects focusing on vLLM or LLM inference.

About the job

Role Overview

We are seeking a passionate inference runtime engineer eager to explore and expand the frontiers of LLM and diffusion model serving. As models evolve and grow in complexity with new architectures like mixture-of-experts and multimodal designs, the demand for innovative solutions in our inference engine intensifies. This role places you at the heart of vLLM, where you will enhance model execution across a variety of hardware platforms and architectures. Your contributions will have a direct influence on the future of AI inference.

About Inferact

Inferact is dedicated to advancing the field of artificial intelligence through innovative solutions in inference technology. Our team, comprised of the original architects of vLLM, is committed to shaping the future of AI by creating tools that make inference faster and more cost-effective.

Similar jobs

1 - 20 of 2,771 Jobs

Select all on this page (20)

Apply

Technical Staff Member - Inference Engineering

Inferact

Full-time|$200K/yr - $400K/yr|Remote|San Francisco

At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, revolutionizing AI progress by making inference both more accessible and efficient. Our founding team consists of the original creators and key maintainers of vLLM, positioning us uniquely at the nexus of cutting-edge models and advanced hardware.Role OverviewWe are seeking a passionate inference runtime engineer eager to explore and expand the frontiers of LLM and diffusion model serving. As models evolve and grow in complexity with new architectures like mixture-of-experts and multimodal designs, the demand for innovative solutions in our inference engine intensifies. This role places you at the heart of vLLM, where you will enhance model execution across a variety of hardware platforms and architectures. Your contributions will have a direct influence on the future of AI inference.

Jan 22, 2026

Apply

Technical Staff Member - Machine Learning Systems & Inference

Gimlet Labs

Full-time|On-site|San Francisco

At Gimlet Labs, we are pioneering the development of the first heterogeneous neocloud designed specifically for AI workloads. As the demand for AI systems surges, traditional homogeneous infrastructures face critical limits in power, capacity, and cost. Our innovative platform effectively decouples AI workloads from their hardware foundations, intelligently partitioning tasks and orchestrating them to the most suitable hardware for optimal performance and efficiency. This strategy fosters heterogeneous systems that span multiple vendors and generations, including cutting-edge accelerators, enabling significant enhancements in performance and cost-effectiveness at scale.In addition to this foundational work, Gimlet is establishing a robust neocloud for agentic workloads. Our clients benefit from deploying and managing their workloads via stable, production-ready APIs, without the need to navigate hardware selection or performance optimization intricacies.We collaborate with foundation labs, hyperscalers, and AI-native companies to drive real production workloads capable of scaling to gigawatt-class AI datacenters.We are currently seeking a Member of Technical Staff specializing in ML systems and inference. In this pivotal role, you will be responsible for designing and constructing inference systems that facilitate the execution of complete models in real production environments. You will operate at the intersection of model architecture and system performance to ensure that inference processes are swift, predictable, and scalable.This position is perfect for engineers with a deep understanding of modern model execution and a passion for optimizing latency, throughput, and memory utilization across the entire inference lifecycle.

Mar 10, 2026

Apply

Lead Technical Staff Member - Inference Infrastructure

Cohere

Full-time|On-site|San Francisco

Cohere builds and deploys advanced AI models used by developers and enterprises. These models support applications like content generation, semantic search, retrieval-augmented generation (RAG), and intelligent agents. The team’s work aims to make AI more accessible and practical for real-world use. Each person at Cohere plays a direct role in strengthening the models and increasing their value for clients. The company values practical outcomes and continuous improvement, focusing on delivering reliable technology to users. The team includes researchers, engineers, designers, and professionals from a wide range of backgrounds. Cohere believes that diverse perspectives help create better products. The company welcomes those interested in shaping the future of AI to join its mission.

Apr 28, 2026

Apply

Inference Technical Lead, Sora

OpenAI

Full-time|Hybrid|San Francisco

Join the Sora Team at OpenAIThe Sora team is at the forefront of developing multimodal capabilities within OpenAI’s foundational models. We are a dynamic blend of research and product development, committed to integrating sophisticated multimodal functionalities into our AI offerings. Our focus is on delivering solutions that are not only reliable and intuitive but also resonate with our mission to foster broad societal benefits.Your Role as Inference Technical LeadWe are seeking a talented GPU Inference Engineer to enhance the model serving efficiency for Sora. This pivotal position will empower you to spearhead initiatives aimed at optimizing inference performance and scalability. You will collaborate closely with our researchers to design and develop models that are optimized for inference, directly contributing to the success of our projects.Your contributions will be vital in advancing the team’s overarching objectives, allowing leadership to concentrate on high-impact initiatives by establishing a robust technical foundation.Key Responsibilities:Enhance model serving, inference performance, and overall system efficiency through focused engineering efforts.Implement optimizations targeting kernel and data movement to boost system throughput and reliability.Collaborate with research and product teams to ensure our models operate effectively at scale.Design, construct, and refine essential serving infrastructure to meet Sora’s growth and reliability demands.You Will Excel in This Role If You:Possess deep knowledge in model performance optimization, particularly at the inference level.Have a strong foundation in kernel-level systems, data movement, and low-level performance tuning.Are passionate about scaling high-performing AI systems that address real-world, multimodal challenges.Thrive in ambiguous situations, setting technical direction, and driving complex projects to fruition.This role is based in San Francisco, CA. We follow a hybrid work model requiring 3 in-office days per week and offer relocation assistance to new hires.

Apr 21, 2025

Apply

Staff Technical Lead for Inference & ML Performance

fal

Full-time|On-site|San Francisco

Join fal as we revolutionize the generative-media infrastructure landscape. Our mission is to enhance model inference performance, enabling creative experiences on an unprecedented scale. We are seeking a Staff Technical Lead for Inference & ML Performance, an individual who possesses a unique blend of deep technical knowledge and strategic foresight. In this pivotal role, you will lead a talented team dedicated to building and optimizing cutting-edge inference systems. If you're ready to influence the future of inference performance in a fast-paced and rapidly growing environment, we want to hear from you.Why This Role MattersIn this role, you will play a crucial part in shaping the future of fal’s inference engine, ensuring that our generative models consistently deliver outstanding performance. Your contributions will directly affect our capacity to swiftly provide innovative creative solutions to a diverse clientele, from individual creators to global brands.Your ResponsibilitiesDefine and steer the technical direction, guiding your team across various domains including kernels, applied performance, ML compilers, and distributed inference to develop high-performance solutions.

Oct 29, 2025

Apply

Technical Staff Member - Inference & Reinforcement Learning Systems

Magic.dev

Full-time|On-site|San Francisco

At Magic, we are driven by our mission to develop safe Artificial General Intelligence (AGI) that propels humanity forward in addressing the most critical challenges. We firmly believe that the future of safe AGI lies in automating research and code generation, allowing us to enhance models and tackle alignment issues more effectively than humans alone can manage. Our innovative approach combines cutting-edge pre-training, domain-specific reinforcement learning (RL), ultra-long context, and efficient inference-time computation to realize this vision.Position OverviewAs a Software Engineer within the Inference & RL Systems team, you will play a pivotal role in designing and managing the distributed systems that enable our models to function seamlessly in production, supporting extensive post-training workflows.This position operates at the intersection of model execution and distributed infrastructure, focusing on systems that influence inference latency, throughput, stability, and the reliability of RL and post-training training loops.Our long-context models impose significant execution demands, including KV-cache scaling, managing memory constraints for lengthy sequences, batching strategies, long-horizon trajectory rollouts, and ensuring consistent throughput under real-world workloads. You will be responsible for the infrastructure that ensures both production inference and large-scale RL iterations are efficient and dependable.Key ResponsibilitiesCraft and scale high-performance inference serving systems.Optimize KV-cache management, batching methods, and scheduling processes.Enhance throughput and latency for long-context tasks.Develop and sustain distributed RL and post-training infrastructure.Boost reliability across rollout, evaluation, and reward pipelines.Automate fault detection and recovery mechanisms for serving and RL systems.Analyze and eliminate performance bottlenecks across GPU, networking, and storage components.Collaborate with Kernel and Research teams to ensure alignment between execution systems and model architecture.QualificationsSolid foundation in software engineering and distributed systems.Proven experience in building or managing large-scale inference or training systems.In-depth understanding of GPU execution constraints and memory trade-offs.Experience troubleshooting performance issues in production machine learning systems.Capability to analyze system-level trade-offs between latency, throughput, and cost.

Feb 28, 2026

Apply

Infrastructure Technical Staff Member

Vapi

Full-time|On-site|San Francisco

About Vapi:At Vapi, we are revolutionizing communication by making voice the primary interface for human interaction.Our platform offers unparalleled configurability for deploying voice agents.In just two years, we have attracted over 600,000 developers, with more than 2,000 joining daily.Experience Vapi now!Why We Need You:We handle millions of calls daily, with thousands occurring concurrently.Every call generates a new audio packet every 20 milliseconds, requiring responses in under 1 second.We are scaling this operation to manage hundreds of millions of calls.This challenge is exciting and incredibly rewarding.Your Responsibilities:30 Days: Get acquainted with our multi-cluster, multi-cloud infrastructure.60 Days: Launch a new service such as Anycast Global Router.90 Days: Take ownership of a domain, such as GPU inference clusters.Your Profile:You have experience from Series B to F funding stages.You have successfully scaled large, resilient, and high-performance systems.Bonus points if you've founded your own startup!Why Choose Vapi:Generational Impact: Create the human interface for every business.Ownership Culture: 70% of our team are previous founders.Supportive Team: Our founders, Jordan and Nikhil, bring that friendly Canadian spirit.Top Investors: Backed by Y Combinator, KP Seed, and Bessemer Series A.What We Provide:Equity Ownership: Competitive salary with excellent equity options.Health Coverage: Comprehensive medical, dental, and vision plans.Team Bonding: We enjoy spending time together, including quarterly off-site events.Flexible Time Off: Take the time you need to recharge.

Jul 29, 2025

Apply

Staff Software Engineer, Inference Infrastructure

Cohere

Full-Time|On-site|San Francisco

Who are we?At Cohere, our mission is to elevate intelligence to benefit humanity. We specialize in training and deploying cutting-edge models for developers and enterprises focused on creating AI systems that deliver extraordinary experiences such as content generation, semantic search, retrieval-augmented generation, and intelligent agents. We view our work as pivotal to the broad acceptance of AI technologies.We are passionate about our creations. Every team member plays a vital role in enhancing our models' capabilities and the value they provide to our customers. We thrive on hard work and speed, always prioritizing our clients' needs.Cohere is a diverse team of researchers, engineers, designers, and more, all dedicated to their craft. Each individual is a leading expert in their field, and we recognize that a variety of perspectives is essential to developing exceptional products.Join us in our mission and help shape the future of AI!Why this role?Are you excited about architecting high-performance, scalable, and reliable machine learning systems? Do you aspire to shape and construct the next generation of AI platforms that enhance advanced NLP applications? We are seeking talented Members of Technical Staff to join our Model Serving team at Cohere. This team is responsible for the development, deployment, and operation of our AI platform, which delivers Cohere's large language models via user-friendly API endpoints. In this role, you will collaborate with multiple teams to deploy optimized NLP models in production settings characterized by low latency, high throughput, and robust availability. Additionally, you will have the opportunity to work directly with customers to create tailored deployments that fulfill their unique requirements.

Jan 12, 2026

Apply

Technical Staff Member - Supercomputing Platform & Infrastructure

magic.dev

Full-time|On-site|San Francisco

At Magic, our mission is to create safe AGI that propels humanity forward in addressing the world’s most critical challenges. We believe that the key to achieving safe AGI lies in automating research and code generation to enhance models and resolve alignment issues more effectively than humans alone. Our unique approach integrates frontier-scale pre-training, domain-specific reinforcement learning, ultra-long context, and inference-time computation to realize this vision.Role OverviewAs a vital member of our Supercomputing Platform & Infrastructure team, you will be instrumental in designing, constructing, and managing the extensive GPU infrastructure that underpins Magic’s model training and inference processes.A key aspect of your role will involve leveraging Terraform-driven infrastructure-as-code methodologies to build and maintain our infrastructure, ensuring reproducibility, reliability, and operational clarity across clusters comprising thousands of GPUs.Magic’s long-context models exert continuous demands on compute, networking, and storage systems. The infrastructure must support long-running distributed jobs, high-throughput data movement, and stringent availability requirements, necessitating designs that are automated, observable, and resilient. You will take ownership of the systems and IaC foundations that facilitate these capabilities.This position has the potential to expand into broader responsibilities encompassing supercomputing platform architecture, influencing how Magic scales GPU clusters and enhances infrastructure reliability as model workloads expand.Key ResponsibilitiesDesign and manage large-scale GPU clusters for model training and inference.Construct and sustain infrastructure utilizing Terraform across both cloud and hybrid environments.Develop modular, scalable IaC frameworks for provisioning compute, networking, and storage resources.Enhance deployment reproducibility, maintain environment consistency, and ensure operational safety.Optimize networking and storage architectures for high-throughput AI workloads.Automate fault detection and recovery mechanisms across distributed clusters.Diagnose complex cross-layer issues involving hardware, drivers, networking, storage, operating systems, and cloud environments.Enhance observability, monitoring, and reliability of essential platform systems.QualificationsStrong foundation in systems engineering principles.Extensive hands-on experience with Terraform, including module design, state management, environment isolation, and large-scale implementations.

Jan 25, 2024

Apply

Infrastructure & Scaling Technical Member

Parallel

Full-time|On-site|San Francisco or Palo Alto

About UsAt Parallel, we are a pioneering web infrastructure company dedicated to empowering businesses across various sectors, including sales, marketing, insurance, and software development. Our innovative products enable organizations to create cutting-edge AI agents with robust and flexible programmatic access to the web.Having successfully raised $130 million from esteemed investors such as Kleiner Perkins, Index Ventures, and Spark Capital, our mission is to reshape the web for AI applications. We are assembling a talented team of engineers, designers, marketers, and operational experts to help us achieve this vision.Job Overview: As a member of our technical staff, you will play a crucial role in building, operating, and scaling our infrastructure, particularly around large language models. Your responsibilities will include ensuring system reliability and cost-efficiency as we expand, anticipating potential bottlenecks, evolving our architecture to meet growing demands, and developing the tools that enhance engineering productivity.About You: You possess a deep understanding of distributed systems, cloud platforms, performance optimization, and scalable architecture. You are adept at balancing trade-offs between cost, reliability, and speed, and you are passionate about enabling teams to innovate rapidly and confidently while supporting products that serve millions of users seamlessly.

Aug 14, 2025

Apply

Technical Staff Member - Pre-Training Infrastructure

Reflection AI

Full-time|On-site|San Francisco

Our MissionAt Reflection AI, our goal is to develop open superintelligence and make it universally accessible.We are pioneering open weight models tailored for individuals, agents, enterprises, and even entire nations. Our diverse team comprises talented AI researchers and industry veterans from prestigious organizations such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic, and many more.Role OverviewConstruct and enhance distributed training systems that drive the pre-training of cutting-edge models.Collaborate with research teams to design and execute extensive training runs for foundational models.Create infrastructure that facilitates efficient training across thousands of GPUs leveraging contemporary distributed training frameworks.Enhance training throughput, stability, and efficiency for extensive model training tasks.Work closely with pre-training researchers to convert experimental concepts into scalable, production-ready training systems.Boost performance of distributed training tasks through optimization of communication, memory management, and GPU utilization.Develop and maintain training pipelines that accommodate large-scale datasets, checkpointing, and iterative experiments.Identify and resolve performance bottlenecks within distributed training systems, including model parallelism, GPU communication, and training runtime environments.Contribute to the creation of systems that promote swift experimentation and iteration on novel training methods.

Mar 24, 2026

Apply

Infrastructure Security Engineer - Member of Technical Staff

Reflection AI

Full-time|On-site|San Francisco

About the Role Reflection AI is hiring a Member of Technical Staff focused on Infrastructure Security in San Francisco. This position plays a key part in protecting the company’s infrastructure from security threats. What You Will Do Work with teams across the company to design, implement, and monitor security protocols and systems Help safeguard digital assets by maintaining the integrity and security of infrastructure

Apr 16, 2026

Apply

Founding Member of Technical Staff at tierzero | San Francisco

tierzero

Full-time|Hybrid|SF HQ

About tierzero tierzero builds tools that help engineering teams manage production code with stronger incident response, better operational visibility, and collaborative knowledge sharing. Companies like Discord, Drata, and Framer use tierzero to support their infrastructure in an AI-driven landscape. Backed by $7 million from investors including Accel and SV Angel, tierzero is growing quickly from its San Francisco headquarters. Role Overview: Founding Member of Technical Staff This is a hands-on role shaping tierzero’s core product and systems from the ground up. The founding technical team works closely with the CEO, CTO, and early customers to solve real engineering challenges. The position is based in San Francisco, with a hybrid schedule: three days each week in the office. What You’ll Do Design and build intelligent AI systems that process large volumes of unstructured data Deliver full-stack features informed by real-time user feedback Improve usability so AI agents are both effective and trustworthy for engineers Develop systems for automated evaluation of LLM outputs, including feedback loops and self-play Construct machine learning pipelines for data ingestion, feature generation, embedding storage, retrieval-augmented generation (RAG), vector search, and graph databases Prototype with open-source LLMs to understand their strengths and weaknesses Create scalable infrastructure for complex, multi-step agents, focusing on memory, state management, and asynchronous workflows Who We’re Looking For 5+ years of professional experience or significant open-source contributions Interest in LLMs, MCPs, cloud infrastructure, and observability tools Comfort working in changing, ambiguous situations Product-focused and customer-first mindset Experience learning from and collaborating with engineers from diverse backgrounds Bonus: Previous experience in a startup setting Work Location Hybrid schedule: three days per week in-person at the San Francisco HQ.

Apr 16, 2026

Apply

Inference Technical Lead - On-Device Transformers

OpenAI

Full-time|Hybrid|San Francisco

About Our TeamJoin the Future of Computing Research team at OpenAI, an innovative applied research group within the Consumer Devices division. Our mission is to pioneer new methods and models that contribute to our overarching goal of developing Artificial General Intelligence (AGI) for the betterment of humanity.Role OverviewAs the Inference Technical Lead, you will collaborate with world-class machine learning researchers and top-notch design talents to push the boundaries of model capabilities. This position is stationed in San Francisco, CA, offering a hybrid work model that includes 4 days in the office, along with relocation assistance for new hires.Key ResponsibilitiesAssess and select silicon platforms, including GPUs, NPUs, and specialized accelerators, for the deployment of OpenAI models on-device and at the edge.Collaborate closely with research teams to co-design model architectures that satisfy real-world constraints such as latency, memory, power, and bandwidth.Conduct system performance analyses to identify trade-offs in model design, memory hierarchy, compute throughput, and hardware capabilities.Work hand-in-hand with hardware vendors and internal infrastructure teams to launch new accelerators, ensuring efficient execution of transformer workloads.Lead a team of engineers in implementing the low-level inference stack, encompassing kernel development and runtime systems.Navigate challenges to transform emerging research capabilities into scalable solutions.Ideal Candidate ProfileProven experience in evaluating or deploying workloads on GPUs, NPUs, or other specialized accelerators.Strong understanding of transformer model performance characteristics, including attention mechanisms, KV-cache behaviors, and memory bandwidth requirements.Experience designing or optimizing high-performance computing systems, such as inference engines, distributed runtimes, or hardware-aware ML pipelines.Background in building or leading teams focused on low-level performance-critical software, including CUDA kernels, compilers, or ML runtimes.Demonstrated ability to thrive in a fast-paced, innovative environment.

Mar 13, 2026

Apply

Technical Staff Member

Catalog

Full-time|On-site|San Francisco

At Catalog, we are pioneering the commerce infrastructure for AI—creating the essential framework that enables digital agents to not only explore the web but also comprehend, analyze, and engage with products. Our innovations drive the future of AI-driven shopping experiences, fundamentally transforming how consumers discover and purchase items online.Role OverviewAs a Technical Staff Member, you will be instrumental in developing core systems, shaping our engineering culture, and transitioning our vision from prototype to a robust platform. This role requires full-stack expertise and a commitment to owning and resolving challenges from start to finish.Who You AreYou have experience creating beloved and trusted products from the ground up.You combine technical proficiency with a keen product sense and data-driven intuition.You are well-versed in AI technologies.You prioritize speed, write clean code, and ensure thorough instrumentation.You seek a high level of ownership within a small, talent-rich team based in San Francisco.Challenges You Will TackleDevelop and deploy agentic-search APIs that deliver structured and real-time product data in milliseconds.Build checkout systems enabling agents to conduct transactions with any merchant.Create an embeddings and retrieval layer that optimizes recall, precision, and cost efficiency.Establish a product graph and ranking pipeline that adapts based on actual user outcomes.Preferred QualificationsProven experience shipping data-centric products in a live environment.Experience with recommendation systems or information retrieval methodologies.Familiarity with API development, search indexing, and data pipeline construction.Our Work CultureWe operate with a small, high-trust, and highly motivated team, fostering an environment of in-person collaboration in North Beach, San Francisco. Our process involves debate, decision-making, and execution.If your profile aligns with our needs, we will contact you to arrange 2-3 brief technical interviews, followed by an onsite meeting in our office where you will collaborate on a small project, exchange ideas, and meet the team.

Oct 15, 2025

Apply

Technical Staff Member

Chroma

Full-time|On-site|San Francisco, CA

At Chroma, we are at the forefront of AI data infrastructure, providing top-tier retrieval solutions that empower developers worldwide.Join us as we navigate the nascent stages of AI technology, and become part of a team that values curiosity and dedication to mastering your craft.There is significant work ahead, and we invite you to contribute to our mission.

Sep 9, 2024

Apply

Technical Staff Member

Adyen

Full-time|On-site|San Francisco

Join our dynamic team at Adyen as a Technical Staff Member in San Francisco! We are seeking innovative minds passionate about technology and problem-solving. In this role, you will collaborate with cross-functional teams to craft solutions that enhance our services and improve customer experiences.

Mar 6, 2026

Apply

Founding Member of Technical Staff at TierZero | San Francisco

tierzero

Full-time|Hybrid|SF HQ

TierZero seeks a Founding Member of Technical Staff to join the team in San Francisco. This in-person position requires working from the SF headquarters at least three days per week. Role overview This role centers on close collaboration with a group of engineers who have collectively delivered over $10 billion in value during their careers. Expect to work side by side with teammates, sharing ideas and building strong connections in the office. The environment often shifts, so adaptability and comfort with changing priorities are important. Key responsibilities Work directly with experienced engineers to design and build new products Prioritize customer needs and satisfaction in product decisions Develop solutions using large language models (LLMs), multi-cloud platforms (MCPs), cloud infrastructure, and observability tools Requirements Minimum 5 years of professional engineering experience or a strong record of open-source contributions Experience in startups and familiarity with their unique challenges is a plus Location This position is based in San Francisco. In-office presence is required three days each week for collaboration.

Apr 23, 2026

Apply

Founding Member of Technical Staff

tierzero

Full-time|Hybrid|SF HQ

About TierZero TierZero helps engineering teams use AI to build and ship code more efficiently. The platform targets the bottleneck of human speed in production, giving teams tools for faster incident response, better operational visibility, and shared knowledge. TierZero is backed by $7M in funding from investors including Accel and SV Angel. Companies like Discord, Drata, and Framer trust TierZero to strengthen their infrastructure for AI-driven engineering. Role Overview: Founding Member of Technical Staff This is an on-site role based at TierZero’s San Francisco headquarters, with three days a week in the office. As a founding member, direct collaboration with the CEO, CTO, and early customers shapes the direction of both product and systems. The work spans hands-on development and close engagement with users and leadership. What You Will Do Design and build intelligent AI systems to analyze large volumes of unstructured data. Deliver full-stack features based on real user feedback. Improve the product experience so AI agents are both reliable and easy for engineers to use. Develop systems that automatically evaluate LLM outputs and advance agentic reasoning using self-play and feedback loops. Create machine learning pipelines, including data ingestion, feature generation, embedding stores, retrieval-augmented generation (RAG), vector search, and graph databases. Prototype with open-source and new LLMs, comparing their strengths and weaknesses. Build scalable infrastructure for long-running, multi-step agents, with attention to memory, state, and asynchronous workflows. What We Look For Over five years of relevant professional or open-source experience. Comfort working in environments with uncertainty and evolving challenges. Strong product focus and a drive for customer satisfaction. Interest in large language models (LLMs), Model Control Planes (MCPs), cloud infrastructure, and observability tools. Previous startup experience is a plus. Location This position is based in San Francisco. Expect to work on-site three days per week at TierZero’s HQ.

Apr 15, 2026

Apply

Founding Member of Technical Staff

TierZero

Full-time|Hybrid|SF HQ

TierZero builds tools that help engineering teams deliver and manage code efficiently. The platform enables quicker incident response, clearer operational visibility, and shared knowledge among engineers. Backed by $7 million from investors like Accel and SV Angel, TierZero supports clients such as Discord, Drata, and Framer as they strengthen infrastructure for AI-driven work. This in-person role is based at TierZero's San Francisco headquarters, with a hybrid schedule requiring three days onsite each week. As a founding member of the technical staff, work directly with the CEO, CTO, and customers to influence the direction of TierZero’s core products and systems. The position calls for flexibility as priorities shift and close collaboration across the company. What you will do Design and develop AI systems that handle large volumes of unstructured data. Build full-stack product features, informed by direct feedback from users. Enhance the product so agents are intelligent, reliable, and easy for engineers to use. Create systems to automatically evaluate outputs from large language models and improve agentic reasoning through self-play and feedback. Construct machine learning pipelines, including data ingestion, feature creation, embedding stores, retrieval-augmented generation (RAG) pipelines, vector search, and graph databases. Experiment with open-source and emerging large language models to compare different approaches. Develop scalable infrastructure for long-running, multi-step agents, including memory, state management, and asynchronous workflows. Requirements Interest in working with large language models, managed cloud platforms, cloud infrastructure, and observability tools. At least 5 years of professional experience or significant open-source contributions. Comfort with shifting priorities and tackling new technical problems. Strong product focus and commitment to customer outcomes. Openness to learning from a team with a track record of delivering over $10 billion in value. Ability to work onsite in San Francisco three days per week. Bonus: Experience in a startup setting and familiarity with startup dynamics.

Apr 24, 2026

Create account — see all 2,771 results

1 - 20 of 2,771 Jobs

Select all on this page (20)

Apply

Technical Staff Member - Inference Engineering

Inferact

Full-time|$200K/yr - $400K/yr|Remote|San Francisco

Jan 22, 2026

Apply

Technical Staff Member - Machine Learning Systems & Inference

Gimlet Labs