Engineering Manager Inference Platform jobs in Sunnyvale – Browse 699 openings on RoboApply Jobs

Engineering Manager Inference Platform jobs in Sunnyvale

Open roles matching “Engineering Manager Inference Platform” with location signals for Sunnyvale. 699 active listings on RoboApply Jobs.

699 jobs found

1 - 20 of 699 Jobs
Apply
companyCerebras Systems logo
Full-time|On-site|Sunnyvale CA or Toronto Canada

At Cerebras Systems, we are revolutionizing AI computing by developing the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This unique approach enables us to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications without the complexity of managing multiple GPUs or TPUs.Our esteemed clientele includes leading model laboratories, prominent global enterprises, and forward-thinking AI-native startups. Notably, OpenAI has entered a multi-year partnership with Cerebras to leverage 750 megawatts of scale, enhancing critical workloads with ultra-high-speed inference.With our groundbreaking wafer-scale architecture, Cerebras Inference delivers the fastest Generative AI inference solution globally, outperforming GPU-based hyperscale cloud inference services by over tenfold. This dramatic increase in speed is transforming how users experience AI applications, facilitating real-time iterations and enhancing intelligence through additional agentic computation.Location: Toronto / SunnyvaleWe are seeking a highly technical, hands-on engineering leader for our Inference Service Platform. In this role, you will guide a high-performing team to address a critical challenge: scaling large language model (LLM) inference on Cerebras’ advanced compute clusters and delivering a world-class, on-premise solution for enterprise customers. You will establish the technical vision while maintaining close engagement with the code, focusing on architecting highly reliable and low-latency distributed systems. If you possess proven expertise in distributed systems and scaling modern model-serving frameworks, we encourage you to apply.

Feb 17, 2026
Apply
companyCerebras Systems logo
Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI technology, developing the world's largest AI chip that is 56 times greater than conventional GPUs. Our innovative wafer-scale architecture delivers the computational capabilities of numerous GPUs on a single chip, simplifying programming to the level of a single device. This groundbreaking approach enables Cerebras to achieve unmatched training and inference speeds, allowing machine learning practitioners to seamlessly execute large-scale ML applications without the complexities of managing extensive GPU or TPU resources. Our clientele includes leading model laboratories, global corporations, and pioneering AI-centric startups. Notably, OpenAI has recently entered into a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of capacity, revolutionizing key workloads with exceptionally rapid inference speeds. Thanks to our extraordinary wafer-scale architecture, Cerebras Inference provides the swiftest Generative AI inference solution available today, operating over ten times faster than GPU-based hyperscale cloud inference services. This significant boost in speed is reshaping the user experience in AI applications, facilitating real-time iterations and enhancing intelligence through advanced agentic computation. About The Role We are looking for an exceptionally talented Deployment Engineer to design and manage our state-of-the-art inference clusters. In this role, you will have the opportunity to work with the unparalleled Wafer-Scale Engine (WSE) and the systems that exploit its extraordinary capabilities.

Feb 17, 2026
Apply
companyCerebras Systems logo
Full-time|On-site|Sunnyvale, CA

Cerebras Systems is revolutionizing the AI landscape with the world's largest AI chip, which is 56 times more extensive than traditional GPUs. Our innovative wafer-scale architecture enables us to deliver the computational power of dozens of GPUs on a single chip, while offering the ease of programming like a single device. This groundbreaking approach empowers Cerebras to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications effortlessly without the complexities of managing numerous GPUs or TPUs.Cerebras serves a diverse clientele that includes leading model laboratories, global corporations, and pioneering AI-focused startups. Recently, OpenAI announced a multi-year collaboration with Cerebras to harness 750 megawatts of scale, significantly enhancing key workloads through ultra-fast inference capabilities.With our cutting-edge wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution globally, exceeding the speed of GPU-based hyperscale cloud inference services by over ten times. This extraordinary speed transformation is reshaping the user experience of AI applications, facilitating real-time iterations and boosting intelligence through enhanced agentic computation.

Feb 17, 2026
Apply
companyCerebras Systems logo
Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI innovation, creating the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our groundbreaking wafer-scale architecture delivers the computational power equivalent to dozens of GPUs on a single chip, combined with the programming simplicity of a unified device. This innovative approach allows us to offer unparalleled training and inference speeds, enabling machine learning practitioners to execute extensive ML applications seamlessly, without the complexities of managing multiple GPUs or TPUs.Cerebras boasts an impressive clientele, including premier model labs, global corporations, and pioneering AI startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aimed at deploying 750 megawatts of scale, revolutionizing critical workloads with ultra-fast inference capabilities.Our unique wafer-scale architecture enables Cerebras Inference to provide the fastest Generative AI inference solution globally, surpassing GPU-based hyperscale cloud inference services by more than tenfold. This remarkable enhancement in speed is reshaping the AI application user experience, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleThe Inference ML Engineering team at Cerebras Systems is committed to empowering our rapid generative inference solution through intuitive APIs, supported by a distributed runtime that operates on extensive clusters of our proprietary hardware. Our goal is to enable enterprises, developers, and researchers to fully harness the capabilities of our platform, leveraging its exceptional performance, scalability, and flexibility. The team collaborates closely with cross-functional groups, including compiler developers, cluster orchestrators, ML scientists, cloud architects, and product teams, to deliver impactful solutions that redefine the limits of ML performance and usability.As a Senior Software Engineer on the Inference ML Engineering team, you will be instrumental in designing and implementing APIs, ML features, and tools that facilitate the execution of state-of-the-art generative AI models on our custom hardware. Your role will involve architecting solutions that allow for seamless model translation and execution, ensuring high throughput and minimal latency while maintaining user-friendliness. You will lead technical initiatives and collaborate with other engineering teams to enhance our solutions.

Feb 17, 2026
Apply
companyCerebras Systems logo
Full-time|Remote|Remote Office; Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI innovation, manufacturing the largest AI chip in the world, which is 56 times bigger than conventional GPUs. Our cutting-edge wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This pioneering approach enables us to offer unmatched training and inference speeds, allowing machine learning practitioners to smoothly execute large-scale ML applications without the complexity of managing numerous GPUs or TPUs. Our clientele includes leading model laboratories, major global corporations, and innovative AI-native startups. Notably, OpenAI has recently partnered with Cerebras to leverage 750 megawatts of scale, revolutionizing critical workloads with ultra-high-speed inference. Our advanced wafer-scale architecture makes Cerebras Inference the fastest Generative AI inference solution available, outperforming GPU-based hyperscale cloud inference services by over tenfold. This remarkable speed enhancement is reshaping the user experience of AI applications, enabling real-time iterations and enhanced intelligence through additional agentic computation.In late 2024, we launched Cerebras Inference, setting a new standard for Generative AI inference speed. Since its launch, we have rapidly scaled our services to meet the rising demand from AI labs, enterprises, and a vibrant developer community.In October 2025, we celebrated our Series G funding round, successfully raising $1.1 billion USD to accelerate the growth of our product offerings and services to satisfy global AI demand.About the TeamThe Cerebras Inference team is dedicated to delivering the most efficient, secure, and reliable enterprise-grade AI service. We design and manage expansive distributed systems that facilitate AI inference with unparalleled speed and efficiency. Join us in scaling our inference capabilities to new heights!

Feb 17, 2026
Apply
companyCerebras Systems logo
Full-time|On-site|Sunnyvale, CA

Role Overview Cerebras Systems is looking for a Staff Software Engineer focused on Inference Cloud. This position is based in Sunnyvale, CA. What You Will Do Design, develop, and optimize software for inference products Work closely with team members to improve performance and reliability Apply advanced AI and machine learning methods to real-world challenges Collaboration Work alongside experienced engineers on projects that shape the future of inference technology at Cerebras Systems.

Apr 14, 2026
Apply
companyCoram AI logo
Full-time|On-site|Sunnyvale

At Coram AI, we are revolutionizing video security for today's world. Our innovative cloud-native platform leverages advanced computer vision and artificial intelligence to empower businesses to enhance safety, make informed decisions, and improve operational efficiency. From real-time alerts to effortless clip sharing and comprehensive multi-site visibility, we are at the forefront of security technology.As part of our dynamic and agile team, you will find a culture that prioritizes clarity, craftsmanship, and meaningful impact. Every team member contributes their unique voice, delivers significant work, and plays a crucial role in shaping how AI can foster a safer, more connected world.We are seeking a technically adept Engineering Manager to oversee our Platform team. This team is responsible for developing the essential real-time infrastructure that supports distributed edge systems under stringent latency, memory, and reliability constraints.In this hands-on leadership position, you will manage a team of highly skilled engineers focused on low-level systems, high-performance networking, concurrency, and edge computing. While expertise in robotics is not a prerequisite, extensive experience in real-time or low-latency systems, along with a capability to dive deep into technical challenges when necessary, is essential.Your Responsibilities• Collaborate with product and leadership teams to outline the technical roadmap for the Platform team.• Spearhead architectural design for distributed edge systems that operate under strict latency and memory constraints.• Manage and prioritize multiple concurrent projects while addressing production challenges and adapting to evolving customer requirements.• Perform thorough design and code reviews to guarantee high reliability and performance of the systems.• Delve into C++ or low-level system code as needed to resolve critical issues.• Supervise the development of IPC, messaging, and real-time data pipelines.• Establish robust engineering processes focusing on observability, testing, and production stability.• Recruit, mentor, and cultivate a high-performing team of systems engineers.• Promote a culture of ownership, technical excellence, and disciplined execution.Qualifications We Seek• A minimum of several years managing engineers working on complex systems.• Strong foundation in computer science principles, including algorithms, data structures, operating systems, and networking.• A history of being a strong individual contributor earlier in your career.• Extensive experience in building or leading real-time or low-latency distributed systems.

Mar 3, 2026
Apply
companyCoreWeave logo
On-site|On-site|Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Senior Software Engineer I specializing in inference, where you will spearhead architectural designs, elevate engineering standards, and significantly enhance latency, throughput, and reliability across various services. Collaborate closely with product, orchestration, and hardware teams to advance our Kubernetes-native inference platform, ensuring we achieve stringent P99 SLAs at scale.

Feb 10, 2026
Apply
companyApplied Intuition, Inc. logo
Full-time|$65K/yr - $400K/yr|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI. Established in 2017 and now boasting a valuation of $15 billion, this Silicon Valley powerhouse is constructing the digital infrastructure necessary to infuse intelligence into every machine in motion worldwide. Our company serves a diverse array of sectors including automotive, defense, trucking, construction, mining, and agriculture, focusing on three primary areas: tools and infrastructure, operating systems, and autonomy. Trust in our solutions is reflected by eighteen of the top twenty global automakers, alongside the United States military and its allies, to deliver physical intelligence. Headquartered in Sunnyvale, California, we have a presence in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.While we expect our employees to primarily work from their Applied Intuition office five days a week, we appreciate the value of flexibility and trust our team to manage their schedules responsibly. This can include occasional remote work, attending morning meetings from home, or adjusting hours to accommodate family commitments.About the RoleWe are seeking an Engineering Manager to lead our Data Platform team within the Data Engineering group. In this pivotal role, you will oversee a top-tier data engineering team responsible for managing the complete data lifecycle—from collection and ingestion to storage, querying, and retrieval—at a scale of hundreds of petabytes.Collaboration with various business units across Applied Intuition is key to defining the technical roadmap for our data platform, ensuring robust support for both external products and internal tools across cloud, hybrid, and on-premises deployments. This is a highly collaborative leadership position that demands deep technical expertise, exceptional people management abilities, and a strong sense of ownership. At Applied Intuition, engineering managers are expected to be engaged leaders who contribute actively to technical direction alongside people management.In this Role, You Will:Lead, develop, and mentor a team of data infrastructure engineers, nurturing a culture of technical excellence, ownership, and collaboration.Define and drive the technical strategy...

Mar 31, 2026
Apply
companyCoreWeave logo
Full-time|$165K/yr - $242K/yr|On-site|Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA/ San Francisco, CA

CoreWeave is seeking a Security Engineering Manager to lead the Platform Security team. This position is based in Livingston, NJ, New York, NY, Sunnyvale, CA, Bellevue, WA, or San Francisco, CA. The team’s mission is to embed security into CoreWeave’s Kubernetes-based platform and public cloud environments, supporting high-performance infrastructure for AI and machine learning workloads. Role overview This manager will oversee and expand the Platform Security engineering team, reporting to the Senior Director of Security Foundations. The focus is on hands-on leadership and technical execution, with an emphasis on building and implementing security controls rather than policy development. The role requires close collaboration with Infrastructure, Platform Engineering, Site Reliability Engineering, and other security teams to ensure security measures keep pace with business growth and evolving needs. What you will do Lead and grow the Platform Security engineering team. Integrate security into Kubernetes infrastructure and public cloud platforms such as AWS, GCP, and Azure. Define and execute strategies for cloud security posture, workload isolation, platform guardrails, image integrity, and multi-cloud security. Develop and implement security controls across CoreWeave’s infrastructure. Work closely with other technical teams to align platform security with business needs. The Platform Security team The Platform Security team at CoreWeave engineers systems that enforce security at the infrastructure layer. Their work spans both CoreWeave’s own Kubernetes-based platform and third-party public cloud environments. The team supports GPU-accelerated infrastructure for demanding AI and machine learning workloads, ensuring that both customer and internal services remain secure as CoreWeave’s global presence expands.

Apr 24, 2026
Apply
companyCoreWeave logo
Full-time|$109K/yr - $160K/yr|On-site|Livingston, NJ / New York, NY / Sunnyvale, CA / San Francisco, CA / Bellevue, WA

CoreWeave is The Essential Cloud for AI™, designed and built by pioneers for pioneers. We empower innovators to confidently build and scale AI through our advanced technology, tools, and expert teams. Trusted by top AI labs, startups, and global enterprises, CoreWeave combines exceptional infrastructure performance with profound technical expertise to drive innovation. Founded in 2017, we became a publicly traded company (Nasdaq: CRWV) in March 2025. Discover more at www.coreweave.com.What You’ll DoAbout the TeamThe Enterprise Systems team at CoreWeave is tasked with constructing, maintaining, and scaling the internal platforms that facilitate collaboration and productivity across the organization. This encompasses tools such as Atlassian (Jira, Confluence) and Asana, supporting our engineering, product, and business teams, along with external partners. Our focus is on ensuring reliability, scalability, and the ongoing enhancement of internal tools to empower teams to operate efficiently and effectively.About the RoleIn the role of a Productivity Platforms Engineer, you will be instrumental in the daily administration and enhancement of CoreWeave’s collaboration and work management tools. Collaborating closely with seasoned engineers, you will maintain system reliability, troubleshoot issues, and implement improvements to optimize team workflows. This position involves hands-on configuration, user support, and gaining exposure to automation and integrations. Over time, you will assume responsibility for specific tools and workflows as you develop your technical expertise.

Apr 6, 2026
Apply
companyCerebras Systems logo
Full-time|On-site|Sunnyvale CA or Toronto Canada

Join Cerebras Systems as an Engineering Manager specializing in Inference ML Runtime, where you will lead a dedicated team in developing groundbreaking machine learning solutions. Your expertise will guide the design and implementation of our inference runtime, ensuring efficiency and performance at scale.As a pivotal leader in our innovative environment, you will collaborate with cross-functional teams, driving the development of state-of-the-art algorithms and systems that push the boundaries of artificial intelligence.

Mar 24, 2026
Apply
companyIntuitive Surgical, Inc. logo
Full-time|On-site|Sunnyvale

Intuitive Surgical, Inc. seeks a Senior Software Engineer to join the Platform Engineering team in Sunnyvale. This role centers on developing and maintaining the foundational software that powers advanced surgical technologies. Key responsibilities Design and build core platform software for surgical systems Collaborate with other engineering teams to create reliable and scalable solutions Drive ongoing enhancements that support improvements in surgical procedures and patient care Role focus This position emphasizes both architecture and hands-on development for the software platform. Work will directly impact the reliability and capabilities of surgical technologies used in healthcare settings.

Apr 24, 2026
Apply
companyApplied Intuition, Inc. logo
Full-time|$222K/yr - $222K/yr|On-site|Sunnyvale, California, United States

Discover Applied IntuitionApplied Intuition, Inc. is leading the charge in revolutionizing the future of physical AI. Established in 2017 and currently valued at $15 billion, our Silicon Valley headquarters is at the forefront of developing the digital infrastructure necessary to integrate intelligence into every moving machine worldwide. We serve diverse sectors including automotive, defense, trucking, construction, mining, and agriculture, focusing on three main areas: tools and infrastructure, operating systems, and autonomy. Our solutions are trusted by 18 of the top 20 global automakers, as well as the United States military and its allies, setting new standards for physical intelligence. Find out more at applied.co.As an in-office organization, we expect our employees to work from the Applied Intuition office five days a week. However, we understand the significance of flexibility and trust our team to manage their schedules effectively. This may involve occasional remote work, starting the day with morning meetings from home, or leaving early to accommodate family commitments.About the Insights TeamThe Insights platform serves as the data and analytics backbone of Applied Intuition's physical AI development tooling suite. We oversee the entire stack — from data ingestion and querying infrastructure to dataset management and advanced analytics. Our platform empowers autonomy engineers with complete visibility and control over the data generated by their workflows, including sensor logs, simulation runs, model evaluation results, KPIs, and more. This gives users the agility to analyze and operationalize data swiftly, eliminating technical hurdles.Join us in building the next-generation Insights platform for physical AI, designed to scale with the demands of fleets, simulation environments, and end-to-end workflows.

Mar 28, 2026
Apply
companydstaff logo
Full-time|On-site|Sunnyvale

We are seeking a dynamic and experienced Manager of Operating Systems and Platforms to lead our technical team at dstaff. In this pivotal role, you will oversee the management and implementation of high-performance operating systems and platform solutions. Your leadership will drive innovation and operational excellence, ensuring that our systems are reliable, scalable, and secure.

May 3, 2015
Apply
companyCerebras Systems logo
Full-time|On-site|Sunnyvale, CA

Cerebras Systems is at the forefront of AI innovation, creating the world's largest AI chip that is 56 times larger than traditional GPUs. Our unique wafer-scale architecture delivers the computational power of numerous GPUs on a single chip, simplifying programming while providing unparalleled training and inference speeds. This revolutionary approach enables users to run extensive machine learning applications effortlessly, eliminating the complexity of managing multiple GPUs or TPUs.Cerebras serves a diverse clientele, including leading model labs, major global enterprises, and pioneering AI-native startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of scale that will redefine key workloads with ultra-high-speed inference.Our groundbreaking wafer-scale architecture ensures that Cerebras Inference provides the fastest Generative AI inference solution globally, achieving speeds that are over ten times faster than GPU-based hyperscale cloud services. This significant enhancement in performance is transforming the user experience of AI applications, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleWe are seeking a Senior Performance Analyst to join our dynamic Product team. As a specialist in state-of-the-art inference performance, you will be the go-to expert on how Cerebras measures up against alternative inference providers in terms of pricing and performance. This role combines performance benchmarking from foundational principles with competitive intelligence. The position revolves around two key pillars:Performance BenchmarkingYou will develop, execute, and sustain reproducible benchmarks that assess Cerebras inference performance for actual customer workloads. This includes metrics such as tokens per second, time to first token, latency under concurrency, and total cost of ownership (TCO).Competitive AnalysisYou will analyze market trends and competitor offerings to position Cerebras effectively within the inference landscape.

Apr 13, 2026
Apply
companyDoorDash logo
Full-time|$130.6K/yr - $235K/yr|On-site|Sunnyvale, CA; San Francisco, CA; Seattle, WA

Join DoorDash's mission to revolutionize the way consumers connect with millions of merchants through our cutting-edge search platform. Our team is dedicated to building a robust, scalable, and high-performance system that enables seamless search indexing, retrieval, and ranking of billions of items. As part of the Core Consumer organization, you will play a pivotal role in enhancing the search experience across iOS, Android, and Web platforms. Collaborate with a team of talented engineers to explore innovative solutions in machine learning and improve search relevance, helping customers easily find what they want as we expand into new verticals like Grocery.

Feb 5, 2026
Apply
companyCerebras Systems logo
Full-time|On-site|Sunnyvale CA or Toronto Canada

Join Cerebras Systems as a Staff Frontend Engineer specializing in Inference. In this pivotal role, you will be instrumental in developing innovative solutions that push the boundaries of AI and machine learning. Your expertise will drive the design and implementation of user-friendly interfaces that enhance our cutting-edge technology.

Mar 30, 2026
Apply
companyintuitive logo
Full-time|On-site|Sunnyvale

Role overview intuitive seeks a Program Manager to guide Growth Platforms initiatives in Sunnyvale. This role shapes strategy for growth and refines platform offerings to build stronger user engagement and satisfaction. What you will do Lead cross-functional teams through all stages of growth platform projects, from initial concept to final delivery Manage project schedules and keep milestones and deliverables on track Analyze market trends and user feedback to recommend platform improvements Collaborate with stakeholders to ensure growth plans support company goals Requirements Experience managing programs or projects, preferably focused on platform or product growth Strong skills in collaboration and communication Ability to interpret market data and user insights Comfort working with multiple teams to achieve shared objectives This position is located in Sunnyvale.

Apr 22, 2026
Apply
companyIntuitive Surgical, Inc. logo
Director of New Platforms

Intuitive Surgical, Inc.

Full-time|On-site|Sunnyvale

Join Intuitive Surgical, a leader in minimally invasive robotic-assisted surgery, as the Director of New Platforms. In this pivotal role, you will spearhead the development and implementation of innovative platforms that enhance our surgical offerings and improve patient outcomes.We are looking for a visionary leader with a passion for technology and a commitment to excellence. You will collaborate with cross-functional teams to drive strategic initiatives, ensuring alignment with our mission to expand access to transformative surgical solutions.

Dec 1, 2025

Sign in to browse more jobs

Create account — see all 699 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.