Engineering Manager For Cloud Inference On Aws jobs in San Francisco – Browse 8,182 openings on RoboApply Jobs

Engineering Manager For Cloud Inference On Aws jobs in San Francisco

Open roles matching “Engineering Manager For Cloud Inference On Aws” with location signals for San Francisco. 8,182 active listings on RoboApply Jobs.

8,182 jobs found

1 - 20 of 8,182 Jobs
Apply
companyAnthropic logo
Full-time|Remote|San Francisco, CA | Seattle, WA

Join Anthropic as an Engineering Manager to lead our innovative Cloud Inference team utilizing AWS technologies. In this pivotal role, you will drive efforts to enhance the efficiency and scalability of our cloud systems while ensuring robust performance and reliability. Your leadership will inspire a talented team of engineers to solve complex challenges, implement best practices, and foster a culture of continuous improvement.

Mar 12, 2026
Apply
companyAnthropic logo
Full-time|Remote|San Francisco, CA | Seattle, WA

Join our innovative team at Anthropic as a Software Engineer specializing in Cloud Inference Safeguards. In this role, you will play a crucial part in developing and enhancing the systems that ensure the robustness and security of our cloud-based inference services. You will collaborate with cross-functional teams to design, implement, and maintain scalable solutions that meet our high standards for reliability and performance.

Mar 27, 2026
Apply
companyAnthropic logo
Full-time|On-site|San Francisco, CA | New York City, NY

Role overview Anthropic seeks a Technical Program Manager to support the Cloud Inference team. This position centers on steering technical projects that influence the development of cloud inference solutions. The role is located in either San Francisco, CA or New York City, NY. What you will do Oversee complex initiatives that move Anthropic’s cloud inference technologies forward Collaborate with engineers and partner teams to ensure delivery of dependable solutions Organize and synchronize work across different functions to achieve project objectives and deadlines

Apr 28, 2026
Apply
companyAnthropic logo
On-site|On-site|San Francisco, CA | New York City, NY | Seattle, WA

About AnthropicAt Anthropic, our mission is to develop AI systems that are safe, interpretable, and controllable. We believe in harnessing AI for the greater good of our users and society at large. Our dynamic team comprises dedicated researchers, engineers, policy experts, and business leaders who collaborate to create beneficial AI systems.About the RoleThe Cloud Inference team is responsible for scaling and optimizing Claude to cater to a vast array of developers and enterprise clients across platforms such as AWS, GCP, Azure, and future cloud service providers (CSPs). We manage the complete lifecycle of Claude on each cloud platform—from API integration and intelligent request routing to inference execution, capacity management, and daily operations.Our engineers wield significant influence, driving multiple key revenue streams while optimizing one of Anthropic's most valuable resources—compute power. As we expand to additional cloud providers, the intricacies of efficiently managing inference across diverse platforms with varying hardware, networking frameworks, and operational models grow substantially. We seek engineers adept at navigating these variances, developing strong abstractions that are effective across providers, and making informed infrastructure choices that keep us cost-effective at scale.Your contributions will enhance the operational scale of our services, expedite our capacity to launch cutting-edge models and innovative features to clients across all platforms, and ensure our large language models (LLMs) adhere to stringent safety, performance, and security standards.

Feb 5, 2026
Apply
companyDigitalOcean, Inc. logo
Full-time|Remote|San Francisco

Join DigitalOcean as a Senior Engineer focused on Inference Optimizations, where you will play a pivotal role in enhancing our AI and machine learning capabilities. Collaborate with a talented team to develop cutting-edge solutions that optimize inference processes across various applications.

Mar 17, 2026
Apply
companyInferact logo
Full-time|$200K/yr - $400K/yr|Remote|San Francisco

At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, revolutionizing AI progress by making inference both more accessible and efficient. Our founding team consists of the original creators and key maintainers of vLLM, positioning us uniquely at the nexus of cutting-edge models and advanced hardware.Role OverviewWe are seeking a passionate inference runtime engineer eager to explore and expand the frontiers of LLM and diffusion model serving. As models evolve and grow in complexity with new architectures like mixture-of-experts and multimodal designs, the demand for innovative solutions in our inference engine intensifies. This role places you at the heart of vLLM, where you will enhance model execution across a variety of hardware platforms and architectures. Your contributions will have a direct influence on the future of AI inference.

Jan 22, 2026
Apply
companyPulse logo
Full-time|On-site|San Francisco

OverviewAt Pulse, we are revolutionizing the way data infrastructure operates by addressing the critical challenge of accurately extracting structured information from intricate documents on a large scale. Our innovative document understanding technique merges intelligent schema mapping with advanced extraction models, outperforming traditional OCR and parsing methods.Located in the heart of San Francisco, we are a dynamic team of engineers dedicated to empowering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. Backed by top-tier investors, we are rapidly expanding our footprint in the industry.What sets our technology apart is our sophisticated multi-stage architecture, which includes:Specialized models for layout understanding and component detectionLow-latency OCR models designed for precise extractionAdvanced algorithms for reading-order in complex document structuresProprietary methods for table structure recognition and parsingFine-tuned vision-language models for interpreting charts, tables, and figuresIf you possess a strong passion for the convergence of computer vision, natural language processing, and data infrastructure, your contributions at Pulse will significantly impact our clients and help shape the future of document intelligence.

Jul 30, 2025
Apply
companyCrusoe logo
Full-time|$204K/yr - $247K/yr|On-site|San Francisco, CA - US

At Crusoe, we are on a mission to enhance the availability of energy and intelligence. We are developing the driving force behind a future where individuals can harness the power of AI without compromising on scale, speed, or sustainability.Join the AI revolution with sustainable technology at Crusoe. This is your chance to lead impactful innovations, contribute to meaningful projects, and collaborate with a team dedicated to pioneering responsible and transformative cloud infrastructure.Role Overview:As an integral member of the Crusoe Managed AI Services team, you will oversee the entire product lifecycle for our Managed Inference services. From conceptualization and strategic planning to execution and market introduction, you will be the driving force behind our inference service offerings. Your ability to translate market demands and technical details into succinct product specifications and narratives will be crucial in fostering business growth for Crusoe Cloud.This position is a Staff-level individual contributor role that offers considerable autonomy and influence. You will act as a senior product owner for a pivotal segment of our platform, collaborating closely with engineering, infrastructure, and go-to-market teams to expand and enhance Crusoe’s inference capabilities as the organization evolves.This is a unique opportunity to shape and develop a foundational product area within a rapidly growing and innovative company.Key Responsibilities:Lead the complete product lifecycle for Crusoe’s Managed Inference services, encompassing roadmap creation, execution, and iterative improvements.Convert customer feedback, market insights, and technical limitations into clear product requirements and prioritization strategies.Collaborate effectively with Engineering, Infrastructure, and Platform teams to provide scalable and dependable inference services.Influence product decisions regarding performance, reliability, cost-effectiveness, and user experience for developers.Establish and monitor success metrics for inference services operating in production environments.Work alongside go-to-market teams to facilitate product launches, brand positioning, and customer engagement.Articulate product strategy and decisions clearly to cross-functional partners and leadership.

Dec 24, 2025
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamAt OpenAI, we are dedicated to the development of safe artificial general intelligence (AGI) that serves the interests of humanity. This ambitious goal unites the world's leading scientists, engineers, and business professionals in a collaborative effort to achieve it.Our Go-To-Market (GTM) organization plays a crucial role in helping customers understand and implement our cutting-edge AI products. The diverse team comprises experts in Sales, Solutions, Support, Marketing, and Partnerships, all working in unison to deliver impactful AI solutions globally.Within GTM, the Partnerships team is responsible for cultivating a strategic global partner ecosystem aimed at enhancing customer success, fostering responsible AI adoption, and driving business growth. Our partnerships encompass technology partners, systems integrators, and strategic collaborators, all extending the reach and influence of OpenAI’s platform.Role OverviewWe are looking for a dynamic Director of Partner Enablement to spearhead initiatives that empower our partner ecosystem to effectively leverage OpenAI’s technology. This role will be integral to the GTM Partnerships Enablement team, focusing specifically on our pivotal AWS Cloud partnership as the primary enablement lead for this essential collaboration.Your responsibilities will include formulating partner enablement strategies, developing impactful programs, and executing operations that ensure our partners—especially within the AWS ecosystem—are equipped with the necessary knowledge, tools, and resources to drive adoption and facilitate successful implementations. As a leader, you will manage and nurture a team dedicated to creating scalable enablement programs that support our rapidly growing partner network.You will work closely with Partnerships leadership, Partner Directors, Partner Technical Success, and cross-functional GTM teams, along with key stakeholders at AWS, to design and implement enablement initiatives that enhance partner readiness, bolster field alignment, and accelerate joint customer successes.This position will help define OpenAI’s global partner enablement approach and will be instrumental in shaping the systems, processes, and content that drive our partner ecosystem, with a specific emphasis on maximizing impact through our AWS Cloud partnership.This opportunity is based in San Francisco, and we provide relocation assistance to new employees.

Apr 9, 2026
Apply
companyDigitalOcean, Inc. logo
Full-time|Remote|San Francisco

We are seeking a highly skilled Senior Engineer to join our Inference Data Plane team at DigitalOcean. In this pivotal role, you will be responsible for designing and implementing advanced data processing solutions that facilitate machine learning inference at scale. You will work collaboratively with cross-functional teams to optimize our data infrastructure and ensure reliable performance.

Mar 24, 2026
Apply
companyCartesia logo
Full-time|On-site|*HQ - San Francisco, CA

Join Cartesia as an Inference EngineerAt Cartesia, our vision is to create the next evolution of AI: an interactive, omnipresent intelligence that operates seamlessly across all environments. Currently, even the most advanced models struggle to continuously analyze a year's worth of audio, video, and text data—comprising 1 billion text tokens, 10 billion audio tokens, and 1 trillion video tokens—much less perform these tasks on-device.We are at the forefront of developing the model architectures that will make this a reality. Our founding team, who met as PhD candidates at the Stanford AI Lab, pioneered State Space Models (SSMs), a groundbreaking framework for training efficient, large-scale foundation models. Our talented team merges deep expertise in model innovation and systems engineering with a design-focused product engineering approach, enabling us to build and launch state-of-the-art models and user experiences.Supported by leading investors such as Index Ventures and Lightspeed Venture Partners, along with contributions from Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks, and others, we are fortunate to be guided by numerous exceptional advisors and over 90 angel investors from diverse industries, including some of the world’s foremost experts in AI.About the RoleWe are actively seeking an Inference Engineer to propel our mission of creating real-time multimodal intelligence.Your ImpactDevelop and implement a low-latency, scalable, and dependable model inference and serving stack for our innovative foundation models utilizing Transformers, SSMs, and hybrid models.Collaborate closely with our research team and product engineers to efficiently deliver our product suite in a fast, cost-effective, and reliable manner.Construct robust inference infrastructure and monitoring systems for our product offerings.Enjoy substantial autonomy in shaping our products and directly influencing how cutting-edge AI is integrated across diverse devices and applications.What You BringAt Cartesia, we prioritize strong engineering skills due to the complexity and scale of the challenges we tackle.Proficient engineering skills with a comfort level in navigating intricate codebases, and a commitment to producing clean, maintainable code.Experience in developing large-scale distributed systems with strict performance, reliability, and observability requirements.Proven technical leadership, capable of executing and delivering results from zero to one amidst uncertainty.A background in or experience with inference pipelines, machine learning, and generative models.

Dec 12, 2024
Apply
companyPerplexity logo
Full-time|On-site|San Francisco

About the RoleWe are seeking a talented Inference Engineering Manager to spearhead our AI Inference team at Perplexity. This is a remarkable opportunity to design and expand the infrastructure that drives Perplexity's innovative products and APIs, catering to millions of users with cutting-edge AI capabilities.You will take charge of the technical direction and implementation of our inference systems while cultivating and leading a high-caliber team of inference engineers. Our technology stack encompasses Python, PyTorch, Rust, C++, and Kubernetes. You will play a crucial role in architecting and scaling the large-scale deployment of machine learning models for Perplexity's Comet, Sonar, Search, and Deep Research products.Why Perplexity?Develop state-of-the-art systems that are among the fastest in the industry using leading-edge technology.Engage in high-impact work within a smaller team, enjoying considerable ownership and autonomy.Seize the chance to create infrastructure from the ground up instead of maintaining outdated systems.Work across the entire spectrum: minimizing costs, scaling traffic, and advancing the capabilities of inference.Make a significant impact on the technical roadmap and team culture at a rapidly expanding company.ResponsibilitiesLead and nurture a high-performing team of AI inference engineers.Develop APIs for AI inference utilized by both internal and external clients.Design and scale our inference infrastructure for enhanced reliability and efficiency.Benchmark and resolve bottlenecks across our inference stack.Drive large sparse/MoE model inference at rack scale, including sharding strategies for extensive models.Innovate by developing inference systems that support sparse attention and disaggregated pre-fill/decoding serving.Enhance the reliability and observability of our systems and lead incident response efforts.Make technical decisions regarding batching, throughput, latency, and GPU utilization.Collaborate with ML research teams on model optimization and deployment.Recruit, mentor, and develop engineering talent.Establish team processes, engineering standards, and operational excellence.Qualifications5+ years of engineering experience, with at least 2 years in a technical leadership or management capacity.Proficiency in programming languages and tools such as Python, PyTorch, Rust, and C++.Experience with Kubernetes and cloud infrastructure.Strong understanding of machine learning model deployment and optimization.Exceptional problem-solving and communication skills.

Jan 18, 2026
Apply
companyHex Technologies logo
Cloud Security Engineer

Hex Technologies

Full-time|$180K/yr - $220K/yr|Remote|SF, NYC, or Remote (USA)

About the Role Hex Technologies is seeking a skilled Cloud Security Engineer to enhance our security team. In this pivotal role, you will be responsible for safeguarding our cloud infrastructure and ensuring its resilience. You will lead cloud security initiatives and work closely with infrastructure and engineering teams to protect our cloud-native applications. Key Responsibilities: Design, implement, and oversee security solutions and controls for AWS environments and Kubernetes clusters, ensuring effective isolation and sandboxing for Hex’s RCE-as-a-Service platform. Develop, deploy, and maintain infrastructure-as-code utilizing Terraform, while upholding stringent security standards. Perform comprehensive security assessments, threat modeling, and audits on AWS cloud infrastructure and Kubernetes deployments. Collaborate with development and operations teams to integrate security best practices into CI/CD pipelines. Monitor and respond to cloud security incidents, identify root causes, and propose remediation measures. Provide in-depth expertise on compliance requirements related to cloud security, including SOC 2, ISO 27001, GDPR, HIPAA, and PCI DSS. Mentor fellow engineers and advocate for cloud security initiatives throughout the organization. Qualifications: 5+ years of experience in cloud security engineering, particularly with AWS. Proven expertise in Kubernetes security, including cluster hardening, role-based access control (RBAC), network policies, and container vulnerability management. Expertise in Terraform with hands-on experience. Familiarity with AWS security services such as IAM, GuardDuty, Security Hub, CloudTrail, and WAF. Knowledge of CNAPP solutions like Wiz and SIEM solutions such as Panther. Strong understanding of secure software development lifecycle practices, CI/CD security, and DevSecOps methodologies. Relevant certifications such as AWS Certified Security – Specialty, Certified Kubernetes Security Specialist (CKS), and Terraform Associate certification are highly desirable. Additional security certifications from SANS or OffSec are a plus. Exceptional problem-solving, communication, and leadership skills.

Feb 20, 2026
Apply
companyOpenAI logo
Full-time|Hybrid|San Francisco

Join the Sora Team at OpenAIThe Sora team is at the forefront of developing multimodal capabilities within OpenAI’s foundational models. We are a dynamic blend of research and product development, committed to integrating sophisticated multimodal functionalities into our AI offerings. Our focus is on delivering solutions that are not only reliable and intuitive but also resonate with our mission to foster broad societal benefits.Your Role as Inference Technical LeadWe are seeking a talented GPU Inference Engineer to enhance the model serving efficiency for Sora. This pivotal position will empower you to spearhead initiatives aimed at optimizing inference performance and scalability. You will collaborate closely with our researchers to design and develop models that are optimized for inference, directly contributing to the success of our projects.Your contributions will be vital in advancing the team’s overarching objectives, allowing leadership to concentrate on high-impact initiatives by establishing a robust technical foundation.Key Responsibilities:Enhance model serving, inference performance, and overall system efficiency through focused engineering efforts.Implement optimizations targeting kernel and data movement to boost system throughput and reliability.Collaborate with research and product teams to ensure our models operate effectively at scale.Design, construct, and refine essential serving infrastructure to meet Sora’s growth and reliability demands.You Will Excel in This Role If You:Possess deep knowledge in model performance optimization, particularly at the inference level.Have a strong foundation in kernel-level systems, data movement, and low-level performance tuning.Are passionate about scaling high-performing AI systems that address real-world, multimodal challenges.Thrive in ambiguous situations, setting technical direction, and driving complex projects to fruition.This role is based in San Francisco, CA. We follow a hybrid work model requiring 3 in-office days per week and offer relocation assistance to new hires.

Apr 21, 2025
Apply
companyBaseten logo
Full-time|$300K/yr - $300K/yr|On-site|San Francisco

ABOUT BASETENAt Baseten, we empower the leading AI companies of today, including Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer, by providing essential inference capabilities. Our unique blend of applied AI research, adaptable infrastructure, and intuitive developer tools enables innovators at the cutting edge of AI to seamlessly transition advanced models into production. With our recent success in securing a $300M Series E funding round, backed by notable investors such as BOND, IVP, Spark Capital, Greylock, and Conviction, we're on an exciting growth trajectory. Join our team and contribute to the platform that engineers rely on to launch AI-driven products.THE ROLEAs an Applied AI Inference Engineer at Baseten, you'll collaborate closely with clients to design, develop, and implement high-performance AI applications using our platform. You will guide customers through the entire process, from initial concept to deployment, transforming vague business objectives into dependable, observable solutions that meet defined quality, latency, and cost metrics.This position is ideal for innovative engineers eager to gain insight into how modern organizations scale AI adoption. You will thrive if you enjoy a multifaceted role that intersects product development, software engineering, performance optimization, and direct customer engagement.It’s essential to note that this position requires hands-on coding and software development, while also encompassing elements of product management, technical customer success, and pre-sales engineering.EXAMPLE INITIATIVESExplore insights from our Forward Deployed Engineering team through these blog posts: Forward Deployed Engineering on the frontier of AIThe fastest, most accurate Whisper transcriptionDeploy production-ready model servers from Docker imagesDeploy custom ComfyUI workflows as APIs...

Nov 4, 2025
Apply
companyInferact logo
Full-time|$200K/yr - $400K/yr|Remote|San Francisco

At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, aiming to propel AI advancements by making inference processes more efficient and cost-effective. Our company is founded by the original creators and core maintainers of vLLM, placing us at a unique intersection of models and hardware, a position we have cultivated over many years.About the RoleWe are seeking a talented Cloud Orchestration Engineer to develop and maintain the operational framework that ensures vLLM operates seamlessly at an extensive scale. In this role, you will be responsible for designing systems for cluster management, deployment automation, and production monitoring, enabling teams across the globe to deploy AI models effortlessly. Your work will guarantee that vLLM deployments are not only observable and debuggable but also recoverable, transforming operational complexities into reliable infrastructure that operates smoothly.

Jan 22, 2026
Apply
companyFuture logo
Full-time|$200K/yr - $250K/yr|Remote|Remote

Future builds digital personal training experiences that connect people with expert coaches through a seamless app. Since 2017, the company has grown from an idea in a San Francisco café to the largest provider of personal training sessions in the US. In January 2025, Future merged with Autograph, founded by Tom Brady, and is expanding its reach through new partnerships and AI-driven coaching tools. Future continues to invest in technology, grow its coaching roster, and form partnerships with leading athletes. The team is focused on shaping the future of fitness by making expert coaching accessible to more people. Role overview This remote Cloud Infrastructure Engineer position centers on designing, building, and maintaining the cloud platform that underpins Future’s products. The role is hands-on and impacts daily operations for engineering teams, focusing on reliability, security, and efficiency. What you will do Develop and maintain infrastructure-as-code best practices using AWS CDK, keeping cloud resources version-controlled, repeatable, and peer-reviewed. Design and manage AWS infrastructure components, such as ECS, RDS Aurora, API Gateway, S3, and networking, with attention to reliability, performance, and cost efficiency. Build and support an observability stack, including structured logging, distributed tracing, and monitoring, to provide insights into system performance. Requirements Strong experience with AWS and a focus on building resilient, automated systems. Commitment to operational excellence, security, and cost efficiency. Emphasis on enabling engineering teams to deliver work quickly and confidently.

Apr 29, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamJoin the Inference team at OpenAI, where we leverage cutting-edge research and technology to deliver exceptional AI products to consumers, enterprises, and developers. Our mission is to empower users to harness the full potential of our advanced AI models, enabling unprecedented capabilities. We prioritize efficient and high-performance model inference while accelerating research advancements.About the RoleWe are seeking a passionate Software Engineer to optimize some of the world's largest and most sophisticated AI models for deployment in high-volume, low-latency, and highly available production and research environments.Key ResponsibilitiesCollaborate with machine learning researchers, engineers, and product managers to transition our latest technologies into production.Work closely with researchers to enable advanced research initiatives through innovative engineering solutions.Implement new techniques, tools, and architectures that enhance the performance, latency, throughput, and effectiveness of our model inference stack.Develop tools to identify bottlenecks and instability sources, designing and implementing solutions for priority issues.Optimize our code and Azure VM fleet to maximize every FLOP and GB of GPU RAM available.You Will Excel in This Role If You:Possess a solid understanding of modern machine learning architectures and an intuitive grasp of performance optimization strategies, especially for inference.Take ownership of problems end-to-end, demonstrating a willingness to acquire any necessary knowledge to achieve results.Bring at least 5 years of professional software engineering experience.Have or can quickly develop expertise in PyTorch, NVidia GPUs, and relevant optimization software stacks (such as NCCL, CUDA), along with HPC technologies like InfiniBand, MPI, and NVLink.Have experience in architecting, building, monitoring, and debugging production distributed systems, with bonus points for working on performance-critical systems.Have successfully rebuilt or significantly refactored production systems multiple times to accommodate rapid scaling.Are self-driven, enjoying the challenge of identifying and addressing the most critical problems.

Feb 6, 2025
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamJoin OpenAI’s dynamic Inference team, where we empower the deployment of cutting-edge AI models, including our renowned GPT models, advanced Image Generation capabilities, and Whisper, across diverse platforms. Our mission is to ensure these models are not only high-performing and scalable but also available for real-world applications. Collaborating closely with our Research team, we’re committed to bringing the next generation of AI innovations to fruition. As a compact, agile team, we prioritize delivering an exceptional developer experience while continuously pushing the frontiers of artificial intelligence.As we expand our focus into multimodal inference, we are building the necessary infrastructure to support models that process images, audio, and other non-text modalities. This work involves tackling diverse model sizes and interactions, managing complex input/output formats, and ensuring seamless collaboration between product and research teams.About The RoleWe are seeking a passionate Software Engineer to aid in the large-scale deployment of OpenAI’s multimodal models. You will join a small yet impactful team dedicated to creating robust, high-performance infrastructure for real-time audio, image, and various multimodal workloads in production environments.This position is inherently collaborative; you will work directly with researchers who develop these models and with product teams to define novel interaction modalities. Your contributions will enable users to generate speech, interpret images, and engage with models in innovative ways that extend beyond traditional text-based interactions.Key Responsibilities:Design and implement advanced inference infrastructure for large-scale multimodal models.Optimize systems for high-throughput and low-latency processing of image and audio inputs and outputs.Facilitate the transition of experimental research workflows into dependable production services.Engage closely with researchers, infrastructure teams, and product engineers to deploy state-of-the-art capabilities.Contribute to systemic enhancements, including GPU utilization, tensor parallelism, and hardware abstraction layers.You May Excel In This Role If You:Have a proven track record of building and scaling inference systems for large language models or multimodal architectures.Possess experience with GPU-based machine learning workloads and a solid understanding of the performance dynamics associated with large models, particularly with intricate data types like images or audio.Thrive in a fast-paced, experimental environment and enjoy collaborating with cross-functional teams to drive impactful results.

May 21, 2025
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamThe Inference team at OpenAI is dedicated to translating our cutting-edge research into accessible, transformative technology for consumers, enterprises, and developers. By leveraging our advanced AI models, we enable users to achieve unprecedented levels of innovation and productivity. Our primary focus lies in enhancing model inference efficiency and accelerating progress in research through optimized inference capabilities.About the RoleWe are seeking talented engineers to expand and optimize OpenAI's inference infrastructure, specifically targeting emerging GPU platforms. This role encompasses a wide range of responsibilities from low-level kernel optimization to high-level distributed execution. You will collaborate closely with our research, infrastructure, and performance teams to ensure seamless operation of our largest models on cutting-edge hardware.This position offers a unique opportunity to influence and advance OpenAI’s multi-platform inference capabilities, with a strong emphasis on optimizing performance for AMD accelerators.Your Responsibilities Include:Overseeing the deployment, accuracy, and performance of the OpenAI inference stack on AMD hardware.Integrating our internal model-serving infrastructure (e.g., vLLM, Triton) into diverse GPU-backed systems.Debugging and optimizing distributed inference workloads across memory, network, and compute layers.Validating the correctness, performance, and scalability of model execution on extensive GPU clusters.Collaborating with partner teams to design and optimize high-performance GPU kernels for accelerators utilizing HIP, Triton, or other performance-centric frameworks.Working with partner teams to develop, integrate, and fine-tune collective communication libraries (e.g., RCCL) to parallelize model execution across multiple GPUs.Ideal Candidates Will:Possess experience in writing or porting GPU kernels using HIP, CUDA, or Triton, with a strong focus on low-level performance.Be familiar with communication libraries like NCCL/RCCL, understanding their importance in high-throughput model serving.Have experience with distributed inference systems and be adept at scaling models across multiple accelerators.Enjoy tackling end-to-end performance challenges across hardware, system libraries, and orchestration layers.Be eager to join a dynamic, agile team focused on building innovative infrastructure from the ground up.

Oct 8, 2025

Sign in to browse more jobs

Create account — see all 8,182 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.