Model Performance Ai Engineer jobs in San Francisco – Browse 7,458 openings on RoboApply Jobs

Model Performance Ai Engineer jobs in San Francisco

Open roles matching “Model Performance Ai Engineer” with location signals for San Francisco. 7,458 active listings on RoboApply Jobs.

7,458 jobs found

1 - 20 of 7,458 Jobs
Apply
Full-time|Hybrid|SF Hybrid

ABOUT FATHOMAt Fathom, we strive to remove the unnecessary burdens of meetings. Our innovative AI assistant captures, summarizes, and organizes the essential moments of your calls, empowering you and your team to engage fully while maintaining context and clarity. With features like instant, searchable call summaries and seamless CRM updates, Fathom transfor…

Apr 30, 2026
Apply
OpenAI logo
Full-time|Remote|San Francisco

OpenAI is seeking a Performance Modeling Engineer based in San Francisco. This role centers on building and improving models that enhance the performance and efficiency of AI systems. The work directly supports the technical backbone of OpenAI’s products. Key responsibilities Develop and refine models aimed at optimizing the performance of AI systems. Collaborate with engineers and data scientists to tackle technical challenges as they arise. Contribute to projects that improve the efficiency of large-scale AI infrastructure. Role overview This position offers the chance to work on foundational technology that underpins OpenAI’s products. The focus is on practical improvements and close teamwork with technical colleagues to advance the capabilities and efficiency of AI at scale.

Apr 20, 2026
Apply
OpenAI logo
Full-time|On-site|San Francisco

Role overview The Performance Modeling Engineer II position at OpenAI centers on building and applying performance models to enhance the efficiency of advanced AI systems. Based in San Francisco, this role contributes to the reliability and speed of OpenAI’s technologies. What you will do Develop and implement performance models for AI systems Collaborate with data scientists and engineers to refine performance metrics Support the efficiency and rigorous standards of OpenAI’s technologies

Apr 20, 2026
Apply
OpenAI logo
Full-time|On-site|San Francisco

OpenAI is seeking a Software Engineer in San Francisco to focus on improving productivity by optimizing model performance. This position centers on developing solutions that make machine learning models more efficient and effective. Role overview This role involves working closely with teams across different functions to identify and address areas where model performance can be improved. The aim is to deliver changes that have a measurable impact on both systems and workflows. What you will do Collaborate with engineers and other specialists to enhance model efficiency Develop and implement solutions that improve the effectiveness of machine learning systems Contribute to projects that streamline processes and drive productivity gains Impact Your work will help shape improvements in how models operate and how teams at OpenAI achieve their goals. The changes you help deliver will support more effective use of resources and better outcomes for the organization.

Apr 29, 2026
Apply
Baseten logo
Full-time|On-site|San Francisco

ABOUT BASETENAt Baseten, we empower the most innovative AI companies—such as Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer—by providing a robust platform for mission-critical inference. Our unique combination of applied AI research, adaptable infrastructure, and cutting-edge developer tools allows companies at the forefront of AI to deploy state-of-the-art models seamlessly. Having recently secured a $300M Series E funding round from notable investors including BOND, IVP, Spark Capital, Greylock, and Conviction, we are poised for rapid growth. Join us in creating the essential platform for engineers to launch AI products.THE ROLEAre you driven to push the boundaries of artificial intelligence while leading a team of talented engineers? We are seeking a Technical Lead Manager with a focus on machine learning performance and inference. This position is perfect for an individual with a strong engineering foundation who is eager to guide and mentor a team while remaining actively engaged in hands-on technology work. If you excel in a dynamic startup atmosphere and are excited to tackle both leadership and technical challenges, we invite you to apply.EXAMPLE INITIATIVESAs a member of our Model Performance team, you will work on projects such as:Baseten Embeddings Inference: The fastest embeddings solution availableThe Baseten Inference StackDriving model performance optimizationRESPONSIBILITIESLead, mentor, and manage a team of engineers dedicated to developing and optimizing ML model inference and performance.Oversee technical strategy and architectural decisions, fostering improvements across our engineering organization.Collaborate with cross-functional teams to ensure the seamless integration and scalability of ML models in production settings.Drive innovation in model performance and advocate for best practices within the team.

Sep 12, 2024
Apply
Internship|$50/hr - $50/hr|Hybrid|San Francisco

AI Financial Modeling Extern — F2 AILocation: San Francisco, CA / In-Person or RemoteCommitment: 5+ hours per week | 4 - 12+ weeksCompensation: $50/hrAbout F2 AIAt F2 AI, we are revolutionizing private market investments. Our cutting-edge AI technology streamlines the process of analyzing complex, unstructured deal materials, transforming them into actionable, investment-grade insights in mere minutes. By empowering private credit funds, commercial banks, and private equity firms, we enable faster and more confident capital deployment. Supported by top-tier investors such as NFX and Y Combinator, we are committed to expanding our exceptional product and engineering teams, shaping the future of vertical AI for finance.Role OverviewWe are on the lookout for 1–2 exceptional externs with a strong foundation in Investment Banking or Private Equity to contribute to the development of AI-driven financial modeling on the F2 platform.In this role, you will collaborate closely with our Engineering, Product, and Design (EPD) teams in the San Francisco office to translate institutional-level financial modeling standards into automated, intelligent workflows. This hands-on experience will allow you to shape the future of AI in financial modeling.Key ResponsibilitiesEducate F2 agents on best practices for financial modeling.Create and standardize financial modeling templates optimized for AI execution using a first principles approach.Establish formatting, structure, and best practices that align with institutional modeling standards.Conduct rigorous quality assurance on AI-generated outputs to guarantee precision that meets investor expectations.Test edge cases and assist in identifying potential failures in automated modeling workflows.Ideal Candidate ProfilePossess prior experience in Investment Banking, Private Credit, or Private Equity with extensive exposure to financial modeling.Demonstrated ability to build and audit complex 3-statement, LBO, or credit models from the ground up.Strong understanding of model hygiene, structure, and institutional formatting standards.Critical thinker who enjoys analyzing model logic and stress-testing systems.Passionate about leveraging AI to enhance financial workflows.

Feb 20, 2026
Apply
Baseten logo
Full-time|On-site|San Francisco

ABOUT BASETENBaseten is at the forefront of AI technology, empowering leading-edge companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer to seamlessly integrate advanced AI models into their operations. Our unique blend of applied AI research, adaptable infrastructure, and intuitive developer tools enables innovators to bring their most ambitious AI products to life. With our recent $300M Series E funding from top-tier investors such as BOND, IVP, Spark Capital, Greylock, and Conviction, we are poised for rapid growth. Join us in shaping the platform that engineers rely on to deploy transformative AI solutions.THE ROLEAre you driven by a passion for enhancing artificial intelligence applications? We are seeking a proactive Software Engineer specializing in ML performance to join our energetic team. This position is perfect for backend engineers who thrive in a fast-paced startup environment and are eager to make substantial contributions to the realm of Large Language Model (LLM) Inference. If you're enthusiastic about optimizing open-source ML models, we can't wait to hear from you!EXAMPLE INITIATIVESAs a member of our Model Performance team, you will have the opportunity to work on exciting projects, including:Baseten Embeddings Inference: The quickest embeddings solution availableThe Baseten Inference StackDriving model performance optimizationRESPONSIBILITIESDevelop, refine, and implement advanced techniques (quantization, speculative decoding, kv cache reuse, chunked prefill, and LoRA) for ML model inference and infrastructure.Conduct thorough investigations into the codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and other libraries to troubleshoot and resolve ML performance issues.Scale and apply optimization techniques across a diverse array of ML models, with a focus on large language models.

Mar 28, 2024
Apply
OpenAI logo
Full-time|On-site|San Francisco

Role overview The Performance Modeling Lead at OpenAI works from San Francisco and takes on both technical and leadership responsibilities. This position centers on developing new modeling methods that enhance performance across a variety of applications. Alongside direct technical contributions, the role involves guiding a team and shaping project direction. What you will do Develop and improve modeling strategies to raise performance metrics for multiple projects. Use expertise in data analysis, machine learning, and optimization to address complex problems. Lead and mentor a team, supporting their technical development and ensuring strong project outcomes.

Apr 20, 2026
Apply
Full-time|On-site|San Francisco

Join the Innovative Team at Liquid AIFounded as a spin-off from MIT’s CSAIL, Liquid AI is at the forefront of developing cutting-edge AI systems that operate seamlessly across various platforms, including data center accelerators and on-device hardware. Our technology is designed to ensure low latency, efficient memory usage, privacy, and reliability. We collaborate with leading enterprises in sectors such as consumer electronics, automotive, life sciences, and financial services as we rapidly scale our operations. We are seeking talented individuals who are passionate about technology and innovation.Your Role in Our TeamAs a GPU Performance Engineer, your expertise will be critical in enhancing our models and workflows beyond the capabilities of standard frameworks. You will be responsible for designing and deploying custom CUDA kernels, conducting hardware-level profiling, and transforming research concepts into production code that yields tangible improvements in our pipelines (training, post-training, and inference). Our dynamic team values initiative and ownership, and we are looking for a candidate who thrives on tackling complex challenges related to memory hierarchies, tensor cores, and profiling outputs.While San Francisco and Boston are preferred, we welcome applications from other locations.

Jul 29, 2025
Apply
Crusoe logo
Full-time|$172.4K/yr - $209K/yr|On-site|San Francisco, CA - US

At Crusoe, we are on a mission to accelerate the convergence of energy and intelligence. We are building a powerful engine that enables individuals to innovate boldly with AI, all while upholding principles of scalability, speed, and sustainability.Join us in spearheading the AI revolution through sustainable technology. At Crusoe, you will be at the forefront of meaningful innovation, making a significant impact while collaborating with a team dedicated to shaping the future of responsible, transformative cloud infrastructure.About the Role:As a Senior Software Engineer on the Model Lifecycle team, you will play a pivotal role in developing a managed platform that supports the entire application development lifecycle, with an emphasis on harnessing the power of Machine Learning models, particularly Large Language Models (LLMs).Your Responsibilities:Design and maintain systems for fine-tuning large foundational models (SFT, PEFT, LoRA, adapters), ensuring multi-node orchestration, checkpointing, failure recovery, and cost-effective scaling.Create and manage end-to-end training pipelines for Large Language Models.Implement components for distillation and reinforcement learning pipelines, focusing on preference optimization, policy optimization, and reward modeling.Develop and sustain the core agent execution infrastructure.Implement features for dataset, model, and experiment management, emphasizing versioning, lineage, evaluation, and reproducible fine-tuning.Collaboration and Impact:Collaborate closely with Senior Engineers, Principal Engineers, and various product and platform teams to implement systems abstractions and APIs.Engage in technical discussions surrounding training runtimes, scheduling, storage, and overall model lifecycle management.Bring 4-5+ years of industry experience, demonstrating a strong track record of successfully leading a diverse portfolio of initiatives.Participate in and contribute to the open-source LLM ecosystem.This position involves taking significant ownership of core system components.Your Qualifications:Engineering Fundamentals:Bachelor's degree in Computer Science, Engineering, or a related discipline.Proven experience in software engineering with a focus on AI models and machine learning.

Feb 9, 2026
Apply
Zyphra logo
Full-time|On-site|San Francisco

Zyphra is an innovative artificial intelligence company located in the heart of San Francisco, California.The Opportunity:Join our dynamic team as a Research Engineer - Audio & Speech Models, where you will play a pivotal role in advancing Zyphra’s Audio Team. You will be instrumental in developing cutting-edge open-source text-to-speech and audio models. Your contributions will span the full spectrum of the model training process, from data collection and processing to the design of innovative architectures and training approaches.Your Responsibilities:Conduct large-scale audio training operationsOptimize the performance of our training infrastructureCollect, process, and evaluate audio datasetsImplement architectural and methodological improvements through rigorous testingWhat We Seek:A strong research mindset with the ability to navigate projects from ideation to implementation and documentation.Proficiency in rapid prototyping and implementation, allowing for swift experimentation.Effective collaboration skills in a fast-paced research environment.A quick learner who is eager to embrace and implement new concepts.Excellent communication abilities, enabling you to contribute to both research and engineering tasks at scale.Preferred Qualifications:Expertise in training audio models, such as text-to-speech, ASR, speech-to-speech, or emotion recognition.Experience with training audio autoencoders.Solid understanding of signal processing, particularly in audio.Familiarity with diffusion models, consistency models, or GANs.Experience with large-scale (multi-node) GPU training environments.Strong understanding of experimental methodologies for conducting rigorous tests and ablations.Interest in large-scale, parallel data processing pipelines.Competence in PyTorch and Python programming.Experience contributing to large, established codebases with rapid adaptation.

Aug 28, 2025
Apply
Zyphra logo
Full-time|On-site|San Francisco

Join Zyphra as a Research Engineer specializing in AI Performance and Kernel Optimization. In this role, you will work at the forefront of AI technologies, developing and optimizing kernel solutions that enhance the performance of our systems. You will collaborate with cross-functional teams, leveraging your expertise to drive innovation and efficiency.

Mar 16, 2026
Apply
Meter logo
Full-time|$160K/yr - $230K/yr|On-site|San Francisco

About MeterAt Meter, we believe that networking is at the heart of technological advancement. We have innovatively unified the entire networking stack and are now on a mission to make it autonomous.Our team is developing a cutting-edge neural network-driven system designed to analyze raw computer networks, enabling us to address all networking challenges. As outlined on Meter.ai, we are creating models within a closed-loop system that utilizes real-time telemetry, logs, and network events to autonomously troubleshoot issues, enhance performance, and resolve challenges.To achieve this, we require not only exceptional models but also robust infrastructure that ensures our models have clean, versioned, and low-latency access to the necessary data throughout training, evaluation, and deployment phases.Why this Role is EssentialEach Meter network deployed in the field serves as a valuable data source for our Models team. However, without meticulous infrastructure design, this data risks becoming fragmented, outdated, or inconsistent. In this role, you will ensure that such pitfalls are avoided. You will be responsible for the core data interface that drives our model development, experimentation, evaluation, and real-time inference.This position is fundamental and offers a significant impact. Your contributions will shape the speed at which we can train new models, the reliability of their evaluations, and their seamless operation across hundreds of real-world networks. You will collaborate closely with modelers to deliver systems that are elegant, scalable, and robust.Your ResponsibilitiesDesign and implement the Models API: a unified interface for accessing training, evaluation, and deployment data across raw, transformed, and feature-engineered layers.Ensure backward compatibility and feature versioning across continually evolving schemas.Develop scalable pipelines to ingest, transform, and serve petabytes of data across Kafka, Postgres, and Clickhouse.Create CI/CD workflows that evolve the API in tandem with changes to the underlying data schema.Facilitate fine-grained querying of historical and real-time data for any network, at any point in time.Help establish and promote the principle of 'smart data, dumb functions': maximizing operations in the data layer to minimize downstream code complexity.Collaborate with modelers to co-design training frameworks that optimize performance.

Jul 26, 2025
Apply
Tavus logo
Full-time|On-site|San Francisco

About TavusTavus is at the forefront of innovation in human computing. Our mission is to develop AI Humans: an advanced interface that bridges the gap between individuals and machines, eliminating the friction found in current technologies. Our state-of-the-art human simulation models empower machines to see, hear, respond, and even exhibit realistic appearances—facilitating genuine, face-to-face interactions. AI Humans integrate the emotional insight of humans with the scalability and dependability of machines, making them reliable agents accessible 24/7, in any language, on our terms.Imagine having access to an affordable therapist, a personal trainer that fits your schedule, or a team of medical assistants dedicated to providing personalized care for every patient. With Tavus, individuals, enterprises, and developers have the tools to create AI Humans that connect, comprehend, and act with empathy on a large scale.We are a Series A company supported by esteemed investors such as Sequoia Capital, Y Combinator, and Scale Venture Partners.Join us in shaping a future where machines and humans genuinely understand one another.The PositionWe are seeking an AI Researcher to join our core AI team and advance the frontiers of multimodal conversational intelligence. If you excel in dynamic environments, enjoy transforming abstract concepts into functional code, and derive motivation from pushing the boundaries of possibility, this role is designed for you.Your Responsibilities Engage in research focusing on Foundational Multimodal Models specifically in the realm of Conversational Avatars (such as Neural Avatars and Talking-Heads).Develop models for video, audio, and language sequences utilizing Autoregressive and Predictive Architectures (e.g., V-JEPA) and/or Diffusion methodologies, with a focus on temporal and sequential data rather than static images.Collaborate closely with the Applied ML team to implement your research into production systems.Remain at the forefront of multimodal learning and assist us in defining what “cutting edge” will mean in the future.Ideal Candidate ProfilePhD (or nearing completion) in a relevant field, or equivalent practical research experience.Experience in multimodal machine learning, particularly focused on conversational interfaces.

Oct 8, 2025
Apply
OpenAI logo
Full-time|On-site|San Francisco

Role Overview OpenAI is hiring a ChatGPT Performance Engineer in San Francisco. This role focuses on improving the performance and efficiency of ChatGPT’s advanced AI models. The position works closely with cross-functional teams to identify and implement solutions that make ChatGPT faster and more reliable for users around the world. What You Will Do Optimize the speed, reliability, and scalability of ChatGPT’s platforms. Collaborate with engineers and other teams to solve technical challenges. Develop and refine systems to support a seamless user experience globally. Impact This work directly shapes the future of AI at OpenAI, helping deliver a dependable and efficient ChatGPT experience to millions of users.

Apr 15, 2026
Apply
Full-time|On-site|SF Bay Area

About UsAt Lemurian Labs, we are dedicated to democratizing AI technology while prioritizing sustainability. Our mission is to create solutions that minimize environmental impact, ensuring that artificial intelligence serves humanity positively. We are committed to responsible innovation and the sustainable growth of AI.We are in the process of developing a state-of-the-art, portable compiler that empowers developers to 'build once, deploy anywhere.' This technology ensures seamless cross-platform integration, allowing for model training in the cloud and deployment at the edge, all while maximizing resource efficiency and scalability.If you are passionate about scaling AI sustainably and are eager to make AI development more powerful and accessible, we invite you to join our team at Lemurian Labs. Together, we can build a future that is innovative and responsible.The RoleWe are seeking a Senior ML Performance Engineer to take charge of designing and leading our Performance Testing Platform from inception. In this pivotal role, you will be recognized as the technical expert in measuring, validating, and enhancing the performance of large language models (including Llama 3.2 70B, DeepSeek, and others) prior to and following compiler optimization on cutting-edge GPU architectures.This is a critical position that will significantly impact our product quality and customer success. You will work at the intersection of Machine Learning systems, GPU architecture, and performance engineering, constructing the infrastructure that substantiates the value of our compiler.

Oct 31, 2025
Apply
Baseten logo
Full-time|On-site|San Francisco

ABOUT BASETENAt Baseten, we are at the forefront of AI innovation, providing critical inference solutions for leading AI companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our platform combines advanced AI research, adaptable infrastructure, and intuitive developer tools, empowering organizations to deploy state-of-the-art models effectively. With rapid growth and a recent $300M Series E funding round backed by top-tier investors including BOND, IVP, Spark Capital, Greylock, and Conviction, we invite you to join our mission in building the platform of choice for engineers delivering AI products.THE ROLE:As a member of Baseten’s Model Performance (MP) team, you will play a pivotal role in ensuring our platform’s model APIs are not only fast and reliable but also cost-effective. Your primary focus will be on developing and optimizing the infrastructure that supports our hosted API endpoints for cutting-edge open-source models. This role involves working with distributed systems, model serving, and enhancing the developer experience. You will collaborate with a small, dynamic team at the intersection of product development, model performance, and infrastructure, defining how developers interact with AI models on a large scale.RESPONSIBILITIES:Design, develop, and maintain the Model APIs surface, focusing on advanced inference features such as structured outputs (JSON mode, grammar-constrained generation), tool/function calling, and multi-modal serving.Profile and optimize TensorRT-LLM kernels, analyze CUDA kernel performance, create custom CUDA operators, and enhance memory allocation patterns for maximum efficiency across multi-GPU setups.Implement performance improvements across various runtimes based on a deep understanding of their internals, including speculative decoding, guided generation for structured outputs, and custom scheduling algorithms for high-performance serving.Develop robust benchmarking frameworks to evaluate real-world performance across diverse model architectures, batch sizes, sequence lengths, and hardware configurations.Enhance performance across runtimes (e.g., TensorRT, TensorRT-LLM) through techniques such as speculative decoding, quantization, batching, and KV-cache reuse.Integrate deep observability mechanisms (metrics, traces, logs) and establish repeatable benchmarks to assess speed, reliability, and quality.

Oct 11, 2025
Apply
OpenAI logo
Full-time|On-site|San Francisco

About Our TeamJoin the Inference team at OpenAI, where we leverage cutting-edge research and technology to deliver exceptional AI products to consumers, enterprises, and developers. Our mission is to empower users to harness the full potential of our advanced AI models, enabling unprecedented capabilities. We prioritize efficient and high-performance model inference while accelerating research advancements.About the RoleWe are seeking a passionate Software Engineer to optimize some of the world's largest and most sophisticated AI models for deployment in high-volume, low-latency, and highly available production and research environments.Key ResponsibilitiesCollaborate with machine learning researchers, engineers, and product managers to transition our latest technologies into production.Work closely with researchers to enable advanced research initiatives through innovative engineering solutions.Implement new techniques, tools, and architectures that enhance the performance, latency, throughput, and effectiveness of our model inference stack.Develop tools to identify bottlenecks and instability sources, designing and implementing solutions for priority issues.Optimize our code and Azure VM fleet to maximize every FLOP and GB of GPU RAM available.You Will Excel in This Role If You:Possess a solid understanding of modern machine learning architectures and an intuitive grasp of performance optimization strategies, especially for inference.Take ownership of problems end-to-end, demonstrating a willingness to acquire any necessary knowledge to achieve results.Bring at least 5 years of professional software engineering experience.Have or can quickly develop expertise in PyTorch, NVidia GPUs, and relevant optimization software stacks (such as NCCL, CUDA), along with HPC technologies like InfiniBand, MPI, and NVLink.Have experience in architecting, building, monitoring, and debugging production distributed systems, with bonus points for working on performance-critical systems.Have successfully rebuilt or significantly refactored production systems multiple times to accommodate rapid scaling.Are self-driven, enjoying the challenge of identifying and addressing the most critical problems.

Feb 6, 2025
Apply
Descript logo
Full-time|$171K/yr - $171K/yr|On-site|San Francisco, CA

At Descript, we envision a world where video editing is an essential tool for every communicator. Gone are the days of needing multiple monitors and advanced degrees to craft engaging video content. Our platform allows you to edit videos as easily as working with documents and slides, increasingly through the power of AI. We are at the forefront of redefining how videos are recorded and generated, making it more user-friendly and accessible.We are in search of a dedicated Product Manager to shape the future of AI-driven video editing. You will collaborate closely with a dynamic and collaborative team of skilled PMs, AI researchers, engineers, designers, and marketing professionals. This role offers a unique opportunity to work hands-on with state-of-the-art AI technology and contribute to a product that resonates with users and accelerates in growth.As a Product Manager, you will lead the AI Research and Enablement roadmap at Descript. This pivotal role lies at the intersection of advanced AI research, robust ML infrastructure, and strategic product development. Your mission will be to ensure our AI capabilities are unparalleled while empowering our product teams to deliver AI-enhanced features that captivate our users.

Feb 4, 2026
Apply
Anthropic logo
Full-time|Remote|Remote-Friendly (Travel-Required) | San Francisco, CA | New York City, NY

Anthropic is looking for a Research Engineer focused on model evaluations. This position involves research and development to assess and strengthen the performance of AI models. Teams are based in San Francisco and New York City, and the role supports remote work with required travel. Key responsibilities Design and implement evaluations for Anthropic's AI models Collaborate with team members to enhance model performance Contribute to research that pushes the boundaries of AI systems Location Remote-friendly (travel required) San Francisco, CA New York City, NY

Apr 28, 2026

Sign in to browse more jobs

Create account — see all 7,458 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.