Software Engineer Inference Ai Ml jobs in Sunnyvale – Browse 698 openings on RoboApply Jobs

AI Inference Deployment Engineer

Cerebras SystemsSunnyvale CA or Toronto Canada

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

The ideal candidate will possess a strong background in deploying and managing AI infrastructures, with expertise in systems architecture and performance optimization. Proficiency in programming languages such as Python or C++, alongside experience with cloud services and GPU/TPU architectures, is highly desirable. A Bachelor's degree in Computer Science, Engineering, or a related field is preferred.

About the job

Our clientele includes leading model laboratories, global corporations, and pioneering AI-centric startups. Notably, OpenAI has recently entered into a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of capacity, revolutionizing key workloads with exceptionally rapid inference speeds.

Thanks to our extraordinary wafer-scale architecture, Cerebras Inference provides the swiftest Generative AI inference solution available today, operating over ten times faster than GPU-based hyperscale cloud inference services. This significant boost in speed is reshaping the user experience in AI applications, facilitating real-time iterations and enhancing intelligence through advanced agentic computation.

About The Role

We are looking for an exceptionally talented Deployment Engineer to design and manage our state-of-the-art inference clusters. In this role, you will have the opportunity to work with the unparalleled Wafer-Scale Engine (WSE) and the systems that exploit its extraordinary capabilities.

About Cerebras Systems

Cerebras Systems is a pioneering company that specializes in the development of advanced AI hardware, specifically the world's largest AI chip, designed to enhance the efficiency and speed of machine learning applications. With a focus on innovative technology and strategic partnerships, Cerebras is transforming the landscape of AI processing.

Similar jobs

1 - 20 of 698 Jobs

Select all on this page (20)

Apply

AI Inference Deployment Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI technology, developing the world's largest AI chip that is 56 times greater than conventional GPUs. Our innovative wafer-scale architecture delivers the computational capabilities of numerous GPUs on a single chip, simplifying programming to the level of a single device. This groundbreaking approach enables Cerebras to achieve unmatched training and inference speeds, allowing machine learning practitioners to seamlessly execute large-scale ML applications without the complexities of managing extensive GPU or TPU resources. Our clientele includes leading model laboratories, global corporations, and pioneering AI-centric startups. Notably, OpenAI has recently entered into a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of capacity, revolutionizing key workloads with exceptionally rapid inference speeds. Thanks to our extraordinary wafer-scale architecture, Cerebras Inference provides the swiftest Generative AI inference solution available today, operating over ten times faster than GPU-based hyperscale cloud inference services. This significant boost in speed is reshaping the user experience in AI applications, facilitating real-time iterations and enhancing intelligence through advanced agentic computation. About The Role We are looking for an exceptionally talented Deployment Engineer to design and manage our state-of-the-art inference clusters. In this role, you will have the opportunity to work with the unparalleled Wafer-Scale Engine (WSE) and the systems that exploit its extraordinary capabilities.

Feb 17, 2026

Apply

Principal Engineer, AI Inference Reliability

Cerebras Systems

Full-time|Remote|Remote Office; Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI innovation, manufacturing the largest AI chip in the world, which is 56 times bigger than conventional GPUs. Our cutting-edge wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This pioneering approach enables us to offer unmatched training and inference speeds, allowing machine learning practitioners to smoothly execute large-scale ML applications without the complexity of managing numerous GPUs or TPUs. Our clientele includes leading model laboratories, major global corporations, and innovative AI-native startups. Notably, OpenAI has recently partnered with Cerebras to leverage 750 megawatts of scale, revolutionizing critical workloads with ultra-high-speed inference. Our advanced wafer-scale architecture makes Cerebras Inference the fastest Generative AI inference solution available, outperforming GPU-based hyperscale cloud inference services by over tenfold. This remarkable speed enhancement is reshaping the user experience of AI applications, enabling real-time iterations and enhanced intelligence through additional agentic computation.In late 2024, we launched Cerebras Inference, setting a new standard for Generative AI inference speed. Since its launch, we have rapidly scaled our services to meet the rising demand from AI labs, enterprises, and a vibrant developer community.In October 2025, we celebrated our Series G funding round, successfully raising $1.1 billion USD to accelerate the growth of our product offerings and services to satisfy global AI demand.About the TeamThe Cerebras Inference team is dedicated to delivering the most efficient, secure, and reliable enterprise-grade AI service. We design and manage expansive distributed systems that facilitate AI inference with unparalleled speed and efficiency. Join us in scaling our inference capabilities to new heights!

Feb 17, 2026

Apply

Staff Software Engineer, Inference Cloud

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Role Overview Cerebras Systems is looking for a Staff Software Engineer focused on Inference Cloud. This position is based in Sunnyvale, CA. What You Will Do Design, develop, and optimize software for inference products Work closely with team members to improve performance and reliability Apply advanced AI and machine learning methods to real-world challenges Collaboration Work alongside experienced engineers on projects that shape the future of inference technology at Cerebras Systems.

Apr 14, 2026

Apply

Engineering Manager - Inference ML Runtime

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Join Cerebras Systems as an Engineering Manager specializing in Inference ML Runtime, where you will lead a dedicated team in developing groundbreaking machine learning solutions. Your expertise will guide the design and implementation of our inference runtime, ensuring efficiency and performance at scale.As a pivotal leader in our innovative environment, you will collaborate with cross-functional teams, driving the development of state-of-the-art algorithms and systems that push the boundaries of artificial intelligence.

Mar 24, 2026

Apply

Inference Frontend Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Cerebras Systems is revolutionizing the AI landscape with the world's largest AI chip, which is 56 times more extensive than traditional GPUs. Our innovative wafer-scale architecture enables us to deliver the computational power of dozens of GPUs on a single chip, while offering the ease of programming like a single device. This groundbreaking approach empowers Cerebras to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications effortlessly without the complexities of managing numerous GPUs or TPUs.Cerebras serves a diverse clientele that includes leading model laboratories, global corporations, and pioneering AI-focused startups. Recently, OpenAI announced a multi-year collaboration with Cerebras to harness 750 megawatts of scale, significantly enhancing key workloads through ultra-fast inference capabilities.With our cutting-edge wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution globally, exceeding the speed of GPU-based hyperscale cloud inference services by over ten times. This extraordinary speed transformation is reshaping the user experience of AI applications, facilitating real-time iterations and boosting intelligence through enhanced agentic computation.

Feb 17, 2026

Apply

Senior Software Engineer I, Inference

CoreWeave

On-site|On-site|Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Senior Software Engineer I specializing in inference, where you will spearhead architectural designs, elevate engineering standards, and significantly enhance latency, throughput, and reliability across various services. Collaborate closely with product, orchestration, and hardware teams to advance our Kubernetes-native inference platform, ensuring we achieve stringent P99 SLAs at scale.

Feb 10, 2026

Apply

Software Engineer, Inference AI/ML

CoreWeave

On-site|On-site| Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Software Engineer on our Inference team, where you'll play a vital role in enhancing the performance of our AI model serving platform. As an entry-level engineer, you will implement impactful features that improve latency, reliability, and cost-efficiency on our cutting-edge GPU-based infrastructure. This role offers a unique opportunity for hands-on learning and professional growth through mentorship from seasoned engineers.

Feb 10, 2026

Apply

Engineering Manager, Inference Platform

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

At Cerebras Systems, we are revolutionizing AI computing by developing the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This unique approach enables us to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications without the complexity of managing multiple GPUs or TPUs.Our esteemed clientele includes leading model laboratories, prominent global enterprises, and forward-thinking AI-native startups. Notably, OpenAI has entered a multi-year partnership with Cerebras to leverage 750 megawatts of scale, enhancing critical workloads with ultra-high-speed inference.With our groundbreaking wafer-scale architecture, Cerebras Inference delivers the fastest Generative AI inference solution globally, outperforming GPU-based hyperscale cloud inference services by over tenfold. This dramatic increase in speed is transforming how users experience AI applications, facilitating real-time iterations and enhancing intelligence through additional agentic computation.Location: Toronto / SunnyvaleWe are seeking a highly technical, hands-on engineering leader for our Inference Service Platform. In this role, you will guide a high-performing team to address a critical challenge: scaling large language model (LLM) inference on Cerebras’ advanced compute clusters and delivering a world-class, on-premise solution for enterprise customers. You will establish the technical vision while maintaining close engagement with the code, focusing on architecting highly reliable and low-latency distributed systems. If you possess proven expertise in distributed systems and scaling modern model-serving frameworks, we encourage you to apply.

Feb 17, 2026

Apply

Senior Inference Machine Learning Runtime Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI innovation, creating the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our groundbreaking wafer-scale architecture delivers the computational power equivalent to dozens of GPUs on a single chip, combined with the programming simplicity of a unified device. This innovative approach allows us to offer unparalleled training and inference speeds, enabling machine learning practitioners to execute extensive ML applications seamlessly, without the complexities of managing multiple GPUs or TPUs.Cerebras boasts an impressive clientele, including premier model labs, global corporations, and pioneering AI startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aimed at deploying 750 megawatts of scale, revolutionizing critical workloads with ultra-fast inference capabilities.Our unique wafer-scale architecture enables Cerebras Inference to provide the fastest Generative AI inference solution globally, surpassing GPU-based hyperscale cloud inference services by more than tenfold. This remarkable enhancement in speed is reshaping the AI application user experience, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleThe Inference ML Engineering team at Cerebras Systems is committed to empowering our rapid generative inference solution through intuitive APIs, supported by a distributed runtime that operates on extensive clusters of our proprietary hardware. Our goal is to enable enterprises, developers, and researchers to fully harness the capabilities of our platform, leveraging its exceptional performance, scalability, and flexibility. The team collaborates closely with cross-functional groups, including compiler developers, cluster orchestrators, ML scientists, cloud architects, and product teams, to deliver impactful solutions that redefine the limits of ML performance and usability.As a Senior Software Engineer on the Inference ML Engineering team, you will be instrumental in designing and implementing APIs, ML features, and tools that facilitate the execution of state-of-the-art generative AI models on our custom hardware. Your role will involve architecting solutions that allow for seamless model translation and execution, ensuring high throughput and minimal latency while maintaining user-friendliness. You will lead technical initiatives and collaborate with other engineering teams to enhance our solutions.

Feb 17, 2026

Apply

Software Engineer - Specializing in Axion Data Engine and ML Ops

Applied Intuition

Full-time|On-site|Sunnyvale, California, United States

Applied Intuition is hiring a Software Engineer in Sunnyvale, California, with a focus on the Axion Data Engine and machine learning operations. This role centers on building and supporting the systems that power advanced data processing and ML workflows. Key Responsibilities Collaborate with cross-functional teams to design, build, and deploy data solutions for the Axion Data Engine. Maintain and enhance machine learning operations, aiming to improve system reliability and performance. Develop data processing capabilities that meet high standards for efficiency and accuracy. Team and Impact This position works closely with engineers and specialists from multiple areas. The work directly supports the quality and precision needed in industries that rely on advanced data and machine learning tools.

Apr 28, 2026

Apply

Software Engineer - Robotics at Coram AI | Sunnyvale

Coram AI

Full-time|On-site|Sunnyvale

At Coram AI, we are revolutionizing video security for the contemporary landscape. Our innovative cloud-native platform leverages computer vision and artificial intelligence to empower businesses to enhance safety, make informed decisions, and accelerate operations, featuring real-time alerts, effortless clip sharing, and multi-site visibility.Joining our dynamic and agile team means becoming part of a culture that prioritizes clarity, quality, and impactful contributions. Every team member has a voice, delivers significant work, and plays a crucial role in shaping how AI can foster a safer and more interconnected world.We seek an exceptionally skilled software engineer to develop high-performance, real-time software that operates on edge devices while adhering to stringent latency and memory limitations. This position emphasizes deterministic execution, distributed system architecture, and low-level performance enhancements. You will focus on constructing the infrastructure and runtime systems that enable real-time robotics applications.

Mar 11, 2026

Apply

AI/ML Research Scientist in Advanced Technology

Cerebras Systems

Full-time|On-site|Sunnyvale, CA; Toronto, Ontario, Canada; Vancouver, British Columbia, Canada

Join Cerebras Systems as an AI/ML Research Scientist and be part of a pioneering team at the forefront of advanced technology. In this role, you will leverage your expertise in artificial intelligence and machine learning to develop innovative solutions that will revolutionize the field. Collaborate with top-tier researchers and engineers to push the boundaries of what's possible.

Apr 7, 2026

Apply

Software Engineer - AI Engineering

Applied Intuition

Full-time|On-site|Sunnyvale, California, United States

Join Applied Intuition as a Software Engineer specializing in AI Engineering, where you'll have the opportunity to work on cutting-edge technology and contribute to innovative projects that shape the future of artificial intelligence. As part of our dynamic team, you will collaborate with talented professionals to design, develop, and implement AI solutions that address real-world challenges.

Mar 25, 2026

Apply

Infrastructure Software Engineer

Coram AI

Full-time|On-site|Sunnyvale

Join Coram AI, where we are redefining video security for a modern landscape. Our innovative, cloud-native platform harnesses computer vision and artificial intelligence to empower businesses with enhanced safety, informed decision-making, and rapid operational responses, ranging from real-time alerts to effortless clip sharing and comprehensive visibility across multiple sites.As a member of our dynamic and agile team, you will embrace clarity, craftsmanship, and impactful contributions. Every team member's voice matters, they deliver significant results, and collectively shape the future of AI in making the world safer and more interconnected.About the Role:At Coram AI, our infrastructure transcends the conventional cloud-based stack. Alongside our AWS and Kubernetes framework, we manage an extensive array of IoT devices remotely. We are seeking a skilled engineer to take charge of a substantial segment of our edge and cloud architecture that supports our IoT product line—responsible not only for infrastructure but also for developing and maintaining our proprietary in-house software.Joining our team means tackling intriguing challenges at the crossroads of user experience, machine learning, and infrastructure. It embodies a commitment to excellence, continuous learning, and delivering exceptional products to our clients in a high-energy startup environment.Key Responsibilities:Develop and maintain production-grade software for our custom edge infrastructure stack.Provision and manage resources within AWS.Oversee provisioning and management for hundreds of thousands of deployed connected IoT devices.Create CI/CD and automation pipelines for various components of the stack.Implement observability and telemetry across our cloud applications and edge devices.Assist in maintaining compliance with various security standards (e.g., SOC2, HIPAA).Enhance developer productivity by optimizing development workflows.This is an onsite role located in Sunnyvale.Qualifications:Minimum of 3 years of experience in developing production infrastructure on AWS using infrastructure as code tools like Pulumi or Terraform.Proficient in Docker and Kubernetes, especially EKS.At least 3 years of experience with programming languages such as Python, Go, or similar.

Feb 18, 2026

Apply

Software Engineer - AI Tooling for Systems Engineering

Applied Intuition, Inc.

Full-time|$125K/yr - $185K/yr|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc., established in 2017 and currently valued at $15 billion, is at the forefront of revolutionizing physical AI. Based in Silicon Valley, we are building the digital infrastructure necessary to infuse intelligence into every moving machine globally. We cater to industries such as automotive, defense, trucking, construction, mining, and agriculture through three core offerings: tools and infrastructure, operating systems, and autonomy. Trusted by 18 of the world's top 20 automakers and the U.S. military alongside its allies, our solutions are pivotal in delivering physical intelligence. Our headquarters are located in Sunnyvale, California, with additional offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.We are an in-office organization, with the expectation that employees primarily work from their Applied Intuition office five days a week. However, we value flexibility and trust our employees to manage their schedules responsibly, which may include occasional remote work, beginning the day with morning meetings from home, or leaving earlier to accommodate family obligations.About the RoleYour Responsibilities at Applied Intuition:Design, develop, and productionize scalable internal tools and AI workflows that facilitate system engineering and validation for autonomous vehicle initiatives.Integrate data across requirements management, modeling, and validation tools to ensure comprehensive traceability from system requirements to test outcomes.Build backend services and APIs to consolidate distributed engineering artifacts into a reliable, cohesive platform.Create dashboards and KPIs to evaluate requirement coverage, trace completeness, and validation progress.Take ownership of and enhance the core traceability data model, enabling bidirectional traceability, versioning, baselining, and change impact analysis.Refactor internal prototypes into production-grade, certifiable systems with robust reliability, access control, and auditability.

Mar 7, 2026

Apply

Senior Performance Analyst - Inference at Cerebras Systems | Sunnyvale, CA

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Cerebras Systems is at the forefront of AI innovation, creating the world's largest AI chip that is 56 times larger than traditional GPUs. Our unique wafer-scale architecture delivers the computational power of numerous GPUs on a single chip, simplifying programming while providing unparalleled training and inference speeds. This revolutionary approach enables users to run extensive machine learning applications effortlessly, eliminating the complexity of managing multiple GPUs or TPUs.Cerebras serves a diverse clientele, including leading model labs, major global enterprises, and pioneering AI-native startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of scale that will redefine key workloads with ultra-high-speed inference.Our groundbreaking wafer-scale architecture ensures that Cerebras Inference provides the fastest Generative AI inference solution globally, achieving speeds that are over ten times faster than GPU-based hyperscale cloud services. This significant enhancement in performance is transforming the user experience of AI applications, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleWe are seeking a Senior Performance Analyst to join our dynamic Product team. As a specialist in state-of-the-art inference performance, you will be the go-to expert on how Cerebras measures up against alternative inference providers in terms of pricing and performance. This role combines performance benchmarking from foundational principles with competitive intelligence. The position revolves around two key pillars:Performance BenchmarkingYou will develop, execute, and sustain reproducible benchmarks that assess Cerebras inference performance for actual customer workloads. This includes metrics such as tokens per second, time to first token, latency under concurrency, and total cost of ownership (TCO).Competitive AnalysisYou will analyze market trends and competitor offerings to position Cerebras effectively within the inference landscape.

Apr 13, 2026

Apply

Staff ML Performance Engineer - Training Efficiency

Wayve Technologies

Full-time|On-site|Sunnyvale

Join Wayve Technologies as a Staff Machine Learning Performance Engineer, specializing in Training Efficiency. In this pivotal role, you will be responsible for enhancing the performance of our machine learning models and algorithms, ensuring they operate at peak efficiency. You will collaborate with cross-functional teams to develop innovative solutions that improve training processes, optimize model performance, and drive impactful results in autonomous vehicle technology.

Feb 27, 2026

Apply

Principal Software Engineer - AI and Simulation

Apptronik

Full-time|$280K/yr - $350K/yr|On-site|Sunnyvale, CA

Join Apptronik, a leading human-centered robotics company revolutionizing the world with AI-powered robots designed to enhance every aspect of life. Our flagship humanoid robot, Apollo, is engineered for seamless collaboration with people, initially focusing on critical sectors like manufacturing and logistics, and poised for future applications in healthcare, domestic environments, and more.We are at the forefront of embodied AI, utilizing our extensive expertise across the entire robotics stack to address some of society's most pressing challenges. As part of our team, you will play a pivotal role in scaling Apollo for market readiness, navigating complex issues around safety, commercialization, and mass production to make a positive impact on the world.JOB SUMMARYWe are in search of a Principal Engineer to spearhead the development of high-performance embedded AI systems and advanced simulation infrastructure for our humanoid robots. This position emphasizes GPU-centric workload orchestration, graphics-driven simulation performance, and robust on-device AI execution.The ideal candidate will possess extensive experience in graphics, display systems, Linux platforms, and low-level embedded software, enabling them to enhance simulation fidelity and optimize real-time AI workloads across robotic platforms.ESSENTIAL DUTIES AND RESPONSIBILITIESGPU Workload OrchestrationArchitect and implement a pipeline for the effective utilization of GPUs across various concurrent AI workloads.Design and develop schedulers and runtime systems to coordinate perception, planning, and control models on-device.Optimize latency, throughput, and power efficiency for real-time robotic operations.On-Device AI SystemsEnhance the robustness and reliability of deployed AI models in constrained embedded environments.Facilitate efficient execution of multi-model pipelines (vision, tracking, control).Collaborate with ML teams to co-design models and runtime systems effectively.Simulation PerformanceLead initiatives to significantly boost simulation throughput and realism.Optimize rendering, physics integration, and data pipelines through graphics expertise.Align simulation outputs with the requirements of real-world deployments.Embedded Systems & Platform IntegrationOversee low-level system integration across Linux-based platforms.Collaborate across kernel, drivers, HAL, and user-space layers to ensure seamless operation.

Apr 8, 2026

Apply

Engineering Manager, AI at Coram AI | Sunnyvale

Coram AI

Full-time|On-site|Sunnyvale

At Coram AI, we are transforming the landscape of video security in the digital age. Our innovative cloud-native platform leverages advanced computer vision and artificial intelligence to empower businesses with enhanced safety, smarter decision-making capabilities, and accelerated operational efficiency through features like real-time alerts, effortless clip sharing, and comprehensive multi-site visibility.Join our dynamic and agile team that prioritizes clarity, craftsmanship, and impactful contributions. Every team member plays a crucial role, delivering significant results and shaping the future of AI-driven security solutions.We are seeking an experienced Engineering Manager to lead our talented AI team at Coram. This team, although small, is exceptionally skilled and operates at the forefront of real-time systems, computer vision, and generative AI.In this hands-on leadership role, you will blend technical guidance, architectural oversight, recruitment, and team management. The ideal candidate will possess up-to-date knowledge of modern deep learning and generative AI, along with substantial experience in building and leading high-performance teams.

Mar 3, 2026

Apply

Staff Frontend Engineer - Inference

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Join Cerebras Systems as a Staff Frontend Engineer specializing in Inference. In this pivotal role, you will be instrumental in developing innovative solutions that push the boundaries of AI and machine learning. Your expertise will drive the design and implementation of user-friendly interfaces that enhance our cutting-edge technology.

Mar 30, 2026

Create account — see all 698 results

1 - 20 of 698 Jobs

Select all on this page (20)

Apply

AI Inference Deployment Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Feb 17, 2026

Apply

Principal Engineer, AI Inference Reliability

Cerebras Systems