Software Engineer, Inference AI/ML

CoreWeave Sunnyvale, CA / Bellevue, WA

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Qualifications

Candidates should possess a BS/MS degree in Computer Science, Electrical Engineering, or a related field, or equivalent practical experience. A solid understanding of data structures, algorithms, and networked services is essential. Proficiency in Python or Go is required (C++ knowledge is a plus), along with foundational Linux skills and familiarity with Git and CI practices. Exposure to containerization and Kubernetes is preferred, as is a curiosity about GPU inference concepts such as micro-batching and caching.

About the job

Join CoreWeave as a Software Engineer on our Inference team, where you'll play a vital role in enhancing the performance of our AI model serving platform. As an entry-level engineer, you will implement impactful features that improve latency, reliability, and cost-efficiency on our cutting-edge GPU-based infrastructure. This role offers a unique opportunity for hands-on learning and professional growth through mentorship from seasoned engineers.

About CoreWeave

CoreWeave is The Essential Cloud for AI™, designed by pioneers for pioneers. Our platform empowers innovators to build and scale AI confidently, offering robust technology, tools, and expert teams. Trusted by leading AI labs, startups, and global enterprises, we combine superior infrastructure performance with deep technical expertise to drive breakthroughs and transform compute into capability. Founded in 2017, we proudly became a publicly traded company (Nasdaq: CRWV) in March 2025. Discover more at www.coreweave.com.

Similar jobs

1 - 20 of 697 Jobs

Search for Engineering Manager Inference Ml Runtime

697 results

Select all on this page (20)

Apply

Engineering Manager - Inference ML Runtime

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Join Cerebras Systems as an Engineering Manager specializing in Inference ML Runtime, where you will lead a dedicated team in developing groundbreaking machine learning solutions. Your expertise will guide the design and implementation of our inference runtime, ensuring efficiency and performance at scale.As a pivotal leader in our innovative environment, you will collaborate with cross-functional teams, driving the development of state-of-the-art algorithms and systems that push the boundaries of artificial intelligence.

Mar 24, 2026

Apply

Senior Inference Machine Learning Runtime Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI innovation, creating the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our groundbreaking wafer-scale architecture delivers the computational power equivalent to dozens of GPUs on a single chip, combined with the programming simplicity of a unified device. This innovative approach allows us to offer unparalleled training and inference speeds, enabling machine learning practitioners to execute extensive ML applications seamlessly, without the complexities of managing multiple GPUs or TPUs.Cerebras boasts an impressive clientele, including premier model labs, global corporations, and pioneering AI startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aimed at deploying 750 megawatts of scale, revolutionizing critical workloads with ultra-fast inference capabilities.Our unique wafer-scale architecture enables Cerebras Inference to provide the fastest Generative AI inference solution globally, surpassing GPU-based hyperscale cloud inference services by more than tenfold. This remarkable enhancement in speed is reshaping the AI application user experience, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleThe Inference ML Engineering team at Cerebras Systems is committed to empowering our rapid generative inference solution through intuitive APIs, supported by a distributed runtime that operates on extensive clusters of our proprietary hardware. Our goal is to enable enterprises, developers, and researchers to fully harness the capabilities of our platform, leveraging its exceptional performance, scalability, and flexibility. The team collaborates closely with cross-functional groups, including compiler developers, cluster orchestrators, ML scientists, cloud architects, and product teams, to deliver impactful solutions that redefine the limits of ML performance and usability.As a Senior Software Engineer on the Inference ML Engineering team, you will be instrumental in designing and implementing APIs, ML features, and tools that facilitate the execution of state-of-the-art generative AI models on our custom hardware. Your role will involve architecting solutions that allow for seamless model translation and execution, ensuring high throughput and minimal latency while maintaining user-friendliness. You will lead technical initiatives and collaborate with other engineering teams to enhance our solutions.

Feb 17, 2026

Apply

Senior Runtime Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI technology, developing the largest AI chip in the world, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture delivers the computational power of dozens of GPUs on a single chip, while ensuring programming is as simple as working with a single device. This revolutionary approach enables Cerebras to provide unmatched training and inference speeds, facilitating seamless execution of large-scale machine learning applications without the complexities of managing multiple GPUs or TPUs.Cerebras proudly serves a diverse clientele, including leading model labs, global corporations, and pioneering AI startups. Recently, OpenAI announced a multi-year collaboration with Cerebras, aiming to harness 750 megawatts of power for transformative workloads through ultra high-speed inference.Our groundbreaking wafer-scale architecture allows Cerebras Inference to offer the most rapid Generative AI inference solution globally, surpassing GPU-based hyperscale cloud services by over ten times. This significant enhancement in speed is reshaping the user experience for AI applications, enabling real-time iteration and amplifying intelligence through advanced agentic computation.About The RoleJoin us in constructing the next generation of large-scale AI systems designed to handle training and inference workloads with unparalleled efficiency and scale. As a Senior Runtime Engineer, you will be responsible for architecting and developing high-performance distributed software that orchestrates extensive compute and data pipelines across diverse clusters. Your contributions will push the boundaries of concurrency, throughput, and scalability, facilitating the effective execution of models on a massive scale. This position sits at the confluence of systems engineering and machine learning performance, requiring both deep architectural insight and practical low-level implementation capabilities. You will play a crucial role in optimizing how models are executed and fine-tuned from data ingestion through to distributed execution across cutting-edge hardware platforms. We are actively recruiting for runtime roles in both Training and Inference.

Feb 17, 2026

Apply

Engineering Manager, Inference Platform

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

At Cerebras Systems, we are revolutionizing AI computing by developing the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This unique approach enables us to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications without the complexity of managing multiple GPUs or TPUs.Our esteemed clientele includes leading model laboratories, prominent global enterprises, and forward-thinking AI-native startups. Notably, OpenAI has entered a multi-year partnership with Cerebras to leverage 750 megawatts of scale, enhancing critical workloads with ultra-high-speed inference.With our groundbreaking wafer-scale architecture, Cerebras Inference delivers the fastest Generative AI inference solution globally, outperforming GPU-based hyperscale cloud inference services by over tenfold. This dramatic increase in speed is transforming how users experience AI applications, facilitating real-time iterations and enhancing intelligence through additional agentic computation.Location: Toronto / SunnyvaleWe are seeking a highly technical, hands-on engineering leader for our Inference Service Platform. In this role, you will guide a high-performing team to address a critical challenge: scaling large language model (LLM) inference on Cerebras’ advanced compute clusters and delivering a world-class, on-premise solution for enterprise customers. You will establish the technical vision while maintaining close engagement with the code, focusing on architecting highly reliable and low-latency distributed systems. If you possess proven expertise in distributed systems and scaling modern model-serving frameworks, we encourage you to apply.

Feb 17, 2026

Apply

AI Inference Deployment Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI technology, developing the world's largest AI chip that is 56 times greater than conventional GPUs. Our innovative wafer-scale architecture delivers the computational capabilities of numerous GPUs on a single chip, simplifying programming to the level of a single device. This groundbreaking approach enables Cerebras to achieve unmatched training and inference speeds, allowing machine learning practitioners to seamlessly execute large-scale ML applications without the complexities of managing extensive GPU or TPU resources. Our clientele includes leading model laboratories, global corporations, and pioneering AI-centric startups. Notably, OpenAI has recently entered into a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of capacity, revolutionizing key workloads with exceptionally rapid inference speeds. Thanks to our extraordinary wafer-scale architecture, Cerebras Inference provides the swiftest Generative AI inference solution available today, operating over ten times faster than GPU-based hyperscale cloud inference services. This significant boost in speed is reshaping the user experience in AI applications, facilitating real-time iterations and enhancing intelligence through advanced agentic computation. About The Role We are looking for an exceptionally talented Deployment Engineer to design and manage our state-of-the-art inference clusters. In this role, you will have the opportunity to work with the unparalleled Wafer-Scale Engine (WSE) and the systems that exploit its extraordinary capabilities.

Feb 17, 2026

Apply

Inference Frontend Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Cerebras Systems is revolutionizing the AI landscape with the world's largest AI chip, which is 56 times more extensive than traditional GPUs. Our innovative wafer-scale architecture enables us to deliver the computational power of dozens of GPUs on a single chip, while offering the ease of programming like a single device. This groundbreaking approach empowers Cerebras to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications effortlessly without the complexities of managing numerous GPUs or TPUs.Cerebras serves a diverse clientele that includes leading model laboratories, global corporations, and pioneering AI-focused startups. Recently, OpenAI announced a multi-year collaboration with Cerebras to harness 750 megawatts of scale, significantly enhancing key workloads through ultra-fast inference capabilities.With our cutting-edge wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution globally, exceeding the speed of GPU-based hyperscale cloud inference services by over ten times. This extraordinary speed transformation is reshaping the user experience of AI applications, facilitating real-time iterations and boosting intelligence through enhanced agentic computation.

Feb 17, 2026

Apply

Machine Learning Runtime Optimization Engineer

Applied Intuition, Inc.

Full-time|$159.1K/yr - $199.3K/yr|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI technology. Established in 2017 and currently valued at $15 billion, this Silicon Valley-based company is building the essential digital infrastructure to infuse intelligence into every moving machine worldwide. We cater to industries such as automotive, defense, trucking, construction, mining, and agriculture through three primary sectors: tools and infrastructure, operating systems, and autonomy. Our solutions are trusted by 18 of the top 20 global automakers, along with the United States military and its allies, to deliver exceptional physical intelligence. Our headquarters is located in Sunnyvale, California, with additional offices across Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.We are an in-office company, expecting our employees to primarily work from their Applied Intuition office five days a week. We understand the importance of flexibility and trust our employees to manage their schedules responsibly. This may include occasional remote work, starting the day with morning meetings from home before heading to the office, or leaving earlier when needed to accommodate family commitments.About the RoleWe are in search of a skilled software engineer with extensive experience in optimizing machine learning models and deploying them in production-grade embedded runtime environments. Your expertise will span the entire ML framework stack, including PyTorch, JAX, ONNX, TensorRT, CUDA, XLA, and Triton.At Applied Intuition, You Will:Lead ML performance optimization across various technologies for both on-road and off-road ADAS/AD stacks aimed at deployment on a range of embedded computing platforms.Devise compute usage strategies to enhance efficiency and minimize latency of model inference for compute boards chosen by our customers.Engage in model pruning and quantization, ensuring successful deployment on memory-constrained platforms.Collaborate closely with ML engineers and software developers to identify and optimize efficient model architecture solutions.Establish methodologies to...

Feb 2, 2026

Apply

Principal Engineer, AI Inference Reliability

Cerebras Systems

Full-time|Remote|Remote Office; Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI innovation, manufacturing the largest AI chip in the world, which is 56 times bigger than conventional GPUs. Our cutting-edge wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This pioneering approach enables us to offer unmatched training and inference speeds, allowing machine learning practitioners to smoothly execute large-scale ML applications without the complexity of managing numerous GPUs or TPUs. Our clientele includes leading model laboratories, major global corporations, and innovative AI-native startups. Notably, OpenAI has recently partnered with Cerebras to leverage 750 megawatts of scale, revolutionizing critical workloads with ultra-high-speed inference. Our advanced wafer-scale architecture makes Cerebras Inference the fastest Generative AI inference solution available, outperforming GPU-based hyperscale cloud inference services by over tenfold. This remarkable speed enhancement is reshaping the user experience of AI applications, enabling real-time iterations and enhanced intelligence through additional agentic computation.In late 2024, we launched Cerebras Inference, setting a new standard for Generative AI inference speed. Since its launch, we have rapidly scaled our services to meet the rising demand from AI labs, enterprises, and a vibrant developer community.In October 2025, we celebrated our Series G funding round, successfully raising $1.1 billion USD to accelerate the growth of our product offerings and services to satisfy global AI demand.About the TeamThe Cerebras Inference team is dedicated to delivering the most efficient, secure, and reliable enterprise-grade AI service. We design and manage expansive distributed systems that facilitate AI inference with unparalleled speed and efficiency. Join us in scaling our inference capabilities to new heights!

Feb 17, 2026

Apply

Staff Software Engineer, Inference Cloud

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Role Overview Cerebras Systems is looking for a Staff Software Engineer focused on Inference Cloud. This position is based in Sunnyvale, CA. What You Will Do Design, develop, and optimize software for inference products Work closely with team members to improve performance and reliability Apply advanced AI and machine learning methods to real-world challenges Collaboration Work alongside experienced engineers on projects that shape the future of inference technology at Cerebras Systems.

Apr 14, 2026

Apply

Senior Software Engineer I, Inference

CoreWeave

On-site|On-site|Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Senior Software Engineer I specializing in inference, where you will spearhead architectural designs, elevate engineering standards, and significantly enhance latency, throughput, and reliability across various services. Collaborate closely with product, orchestration, and hardware teams to advance our Kubernetes-native inference platform, ensuring we achieve stringent P99 SLAs at scale.

Feb 10, 2026

Apply

Staff ML Performance Engineer - Training Efficiency

Wayve Technologies

Full-time|On-site|Sunnyvale

Join Wayve Technologies as a Staff Machine Learning Performance Engineer, specializing in Training Efficiency. In this pivotal role, you will be responsible for enhancing the performance of our machine learning models and algorithms, ensuring they operate at peak efficiency. You will collaborate with cross-functional teams to develop innovative solutions that improve training processes, optimize model performance, and drive impactful results in autonomous vehicle technology.

Feb 27, 2026

Apply

Software Engineer, Inference AI/ML

CoreWeave

On-site|On-site| Sunnyvale, CA / Bellevue, WA

Feb 10, 2026

Apply

Software Engineer - Specializing in Axion Data Engine and ML Ops

Applied Intuition

Full-time|On-site|Sunnyvale, California, United States

Applied Intuition is hiring a Software Engineer in Sunnyvale, California, with a focus on the Axion Data Engine and machine learning operations. This role centers on building and supporting the systems that power advanced data processing and ML workflows. Key Responsibilities Collaborate with cross-functional teams to design, build, and deploy data solutions for the Axion Data Engine. Maintain and enhance machine learning operations, aiming to improve system reliability and performance. Develop data processing capabilities that meet high standards for efficiency and accuracy. Team and Impact This position works closely with engineers and specialists from multiple areas. The work directly supports the quality and precision needed in industries that rely on advanced data and machine learning tools.

Apr 28, 2026

Apply

Senior Performance Analyst - Inference at Cerebras Systems | Sunnyvale, CA

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Cerebras Systems is at the forefront of AI innovation, creating the world's largest AI chip that is 56 times larger than traditional GPUs. Our unique wafer-scale architecture delivers the computational power of numerous GPUs on a single chip, simplifying programming while providing unparalleled training and inference speeds. This revolutionary approach enables users to run extensive machine learning applications effortlessly, eliminating the complexity of managing multiple GPUs or TPUs.Cerebras serves a diverse clientele, including leading model labs, major global enterprises, and pioneering AI-native startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of scale that will redefine key workloads with ultra-high-speed inference.Our groundbreaking wafer-scale architecture ensures that Cerebras Inference provides the fastest Generative AI inference solution globally, achieving speeds that are over ten times faster than GPU-based hyperscale cloud services. This significant enhancement in performance is transforming the user experience of AI applications, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleWe are seeking a Senior Performance Analyst to join our dynamic Product team. As a specialist in state-of-the-art inference performance, you will be the go-to expert on how Cerebras measures up against alternative inference providers in terms of pricing and performance. This role combines performance benchmarking from foundational principles with competitive intelligence. The position revolves around two key pillars:Performance BenchmarkingYou will develop, execute, and sustain reproducible benchmarks that assess Cerebras inference performance for actual customer workloads. This includes metrics such as tokens per second, time to first token, latency under concurrency, and total cost of ownership (TCO).Competitive AnalysisYou will analyze market trends and competitor offerings to position Cerebras effectively within the inference landscape.

Apr 13, 2026

Apply

Engineering Manager - Machine Learning Infrastructure

Applied Intuition, Inc.

Full-time|$204K/yr - $343K/yr|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI technologies. Established in 2017 and currently valued at $15 billion, our Silicon Valley-based company is dedicated to creating the essential digital infrastructure that empowers intelligence across all moving machines globally. Our solutions serve critical sectors including automotive, defense, trucking, construction, mining, and agriculture, focusing on three main domains: tools and infrastructure, operating systems, and autonomy. Trusted by 18 of the top 20 global automakers, as well as the U.S. military and its allies, Applied Intuition is committed to delivering unparalleled physical intelligence solutions. Our headquarters is located in Sunnyvale, California, complemented by offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. For more information, visit applied.co.We prioritize in-office collaboration and expect our employees to work primarily from our Applied Intuition office five days a week. However, we value flexibility and trust our team members to manage their schedules responsibly. This may include occasional remote work, starting the day with morning meetings from home before heading to the office, or leaving early when necessary to accommodate personal commitments.About the RoleAs an Engineering Manager on our Machine Learning Platform team, you will lead an exceptional group of engineers dedicated to building the infrastructure that enables Physical AI at scale. Your team will oversee three pivotal areas: Training & Inference Orchestration, where we develop frameworks to efficiently schedule and execute extensive tasks across thousands of GPUs; GPU Cluster Architecture, where we design and expand what will become the industry's largest GPU cluster for Physical AI; and Performance Optimization, where we maximize hardware utilization, throughput, and cost efficiency for large-scale training and inference workloads. You will collaborate at the intersection of systems engineering and machine learning, working directly with stack development and research teams to eliminate bottlenecks and expedite the transition from experimentation to production.

Feb 19, 2026

Apply

Staff Frontend Engineer - Inference

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Join Cerebras Systems as a Staff Frontend Engineer specializing in Inference. In this pivotal role, you will be instrumental in developing innovative solutions that push the boundaries of AI and machine learning. Your expertise will drive the design and implementation of user-friendly interfaces that enhance our cutting-edge technology.

Mar 30, 2026

Apply

AI/ML Research Scientist in Advanced Technology

Cerebras Systems

Full-time|On-site|Sunnyvale, CA; Toronto, Ontario, Canada; Vancouver, British Columbia, Canada

Join Cerebras Systems as an AI/ML Research Scientist and be part of a pioneering team at the forefront of advanced technology. In this role, you will leverage your expertise in artificial intelligence and machine learning to develop innovative solutions that will revolutionize the field. Collaborate with top-tier researchers and engineers to push the boundaries of what's possible.

Apr 7, 2026

Apply

Senior Software Engineer II, Inference

CoreWeave

On-site|On-site|Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Senior Software Engineer II, where you'll play a pivotal role in shaping the future of AI infrastructure. As an area owner, you'll lead design initiatives and set engineering standards that enhance latency, throughput, and reliability across our advanced services. Collaborate closely with product, orchestration, and hardware teams to elevate our Kubernetes-native inference platform while ensuring we meet stringent P99 SLAs at scale. Your expertise will be integral in implementing cutting-edge optimizations such as micro-batch schedulers and KV-cache reuse, ultimately driving improvements across multiple services.

Feb 10, 2026

Apply

Senior Machine Learning Platform Engineer for Autonomous Driving

42dot

Full-time|On-site|Sunnyvale, United States

42dot is seeking a Senior Machine Learning Platform Engineer to support its work in autonomous driving technology. This position is based in Sunnyvale, United States. Role overview This role focuses on developing machine learning platforms that support autonomous vehicle systems. The work involves designing and building scalable infrastructure to handle complex ML workloads, with a strong emphasis on performance and reliability. What you will do Lead the creation and enhancement of machine learning solutions for autonomous driving applications. Design, implement, and maintain ML platforms to ensure they meet high standards for scalability and reliability. Requirements Extensive experience in building and maintaining machine learning platforms. Background in supporting ML solutions for autonomous vehicle technology or similar fields. Strong skills in designing scalable and high-performance systems.

Apr 29, 2026

Apply

Senior Site Reliability Engineer for AI/ML Innovations

Intuitive Surgical, Inc.

Full-time|On-site|Sunnyvale

Join our dynamic team as a Senior Site Reliability Engineer focused on AI/ML solutions. In this role, you will leverage your expertise to enhance the reliability, scalability, and performance of our cutting-edge AI-driven products. You will work collaboratively with cross-functional teams to design, implement, and maintain robust systems that support our mission to revolutionize surgical technology.

Dec 25, 2025

Create account — see all 697 results