Inference Frontend Engineer

Cerebras SystemsSunnyvale, CA

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

Key ResponsibilitiesCollaborate with a team of elite engineers to tackle real-world challenges across the software stack. Design, implement, and test software solutions that significantly enhance system performance and user experience. Engage in learning and contributing across multiple layers of a fully integrated AI-accelerated system. Acquire practical experience with advanced hardware, compilers, distributed systems, and machine learning frameworks. Required QualificationsA recent graduate or a current student in a university program pursuing a degree in Computer Science, Computer Engineering, or a related field (graduation expected in 2026). This is a new graduate position. Demonstrated strong problem-solving abilities along with excellent communication skills. Proficiency in one or more programming languages; experience with C++ is advantageous.

About the job

Cerebras serves a diverse clientele that includes leading model laboratories, global corporations, and pioneering AI-focused startups. Recently, OpenAI announced a multi-year collaboration with Cerebras to harness 750 megawatts of scale, significantly enhancing key workloads through ultra-fast inference capabilities.

With our cutting-edge wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution globally, exceeding the speed of GPU-based hyperscale cloud inference services by over ten times. This extraordinary speed transformation is reshaping the user experience of AI applications, facilitating real-time iterations and boosting intelligence through enhanced agentic computation.

About Cerebras Systems

Cerebras Systems is at the forefront of AI technology, recognized for producing the largest AI chip in the world, dramatically enhancing computational capabilities and simplifying the programming landscape for machine learning applications.

Similar jobs

1 - 20 of 637 Jobs

Search for Ai Inference Deployment Engineer

637 results

Select all on this page (20)

Apply

AI Inference Deployment Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI technology, developing the world's largest AI chip that is 56 times greater than conventional GPUs. Our innovative wafer-scale architecture delivers the computational capabilities of numerous GPUs on a single chip, simplifying programming to the level of a single device. This groundbreaking approach enables Cerebras to achieve unmatched training and inference speeds, allowing machine learning practitioners to seamlessly execute large-scale ML applications without the complexities of managing extensive GPU or TPU resources. Our clientele includes leading model laboratories, global corporations, and pioneering AI-centric startups. Notably, OpenAI has recently entered into a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of capacity, revolutionizing key workloads with exceptionally rapid inference speeds. Thanks to our extraordinary wafer-scale architecture, Cerebras Inference provides the swiftest Generative AI inference solution available today, operating over ten times faster than GPU-based hyperscale cloud inference services. This significant boost in speed is reshaping the user experience in AI applications, facilitating real-time iterations and enhancing intelligence through advanced agentic computation. About The Role We are looking for an exceptionally talented Deployment Engineer to design and manage our state-of-the-art inference clusters. In this role, you will have the opportunity to work with the unparalleled Wafer-Scale Engine (WSE) and the systems that exploit its extraordinary capabilities.

Feb 17, 2026

Apply

Principal Engineer, AI Inference Reliability

Cerebras Systems

Full-time|Remote|Remote Office; Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI innovation, manufacturing the largest AI chip in the world, which is 56 times bigger than conventional GPUs. Our cutting-edge wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This pioneering approach enables us to offer unmatched training and inference speeds, allowing machine learning practitioners to smoothly execute large-scale ML applications without the complexity of managing numerous GPUs or TPUs. Our clientele includes leading model laboratories, major global corporations, and innovative AI-native startups. Notably, OpenAI has recently partnered with Cerebras to leverage 750 megawatts of scale, revolutionizing critical workloads with ultra-high-speed inference. Our advanced wafer-scale architecture makes Cerebras Inference the fastest Generative AI inference solution available, outperforming GPU-based hyperscale cloud inference services by over tenfold. This remarkable speed enhancement is reshaping the user experience of AI applications, enabling real-time iterations and enhanced intelligence through additional agentic computation.In late 2024, we launched Cerebras Inference, setting a new standard for Generative AI inference speed. Since its launch, we have rapidly scaled our services to meet the rising demand from AI labs, enterprises, and a vibrant developer community.In October 2025, we celebrated our Series G funding round, successfully raising $1.1 billion USD to accelerate the growth of our product offerings and services to satisfy global AI demand.About the TeamThe Cerebras Inference team is dedicated to delivering the most efficient, secure, and reliable enterprise-grade AI service. We design and manage expansive distributed systems that facilitate AI inference with unparalleled speed and efficiency. Join us in scaling our inference capabilities to new heights!

Feb 17, 2026

Apply

Inference Frontend Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Cerebras Systems is revolutionizing the AI landscape with the world's largest AI chip, which is 56 times more extensive than traditional GPUs. Our innovative wafer-scale architecture enables us to deliver the computational power of dozens of GPUs on a single chip, while offering the ease of programming like a single device. This groundbreaking approach empowers Cerebras to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications effortlessly without the complexities of managing numerous GPUs or TPUs.Cerebras serves a diverse clientele that includes leading model laboratories, global corporations, and pioneering AI-focused startups. Recently, OpenAI announced a multi-year collaboration with Cerebras to harness 750 megawatts of scale, significantly enhancing key workloads through ultra-fast inference capabilities.With our cutting-edge wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution globally, exceeding the speed of GPU-based hyperscale cloud inference services by over ten times. This extraordinary speed transformation is reshaping the user experience of AI applications, facilitating real-time iterations and boosting intelligence through enhanced agentic computation.

Feb 17, 2026

Apply

Engineering Manager, Inference Platform

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

At Cerebras Systems, we are revolutionizing AI computing by developing the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This unique approach enables us to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications without the complexity of managing multiple GPUs or TPUs.Our esteemed clientele includes leading model laboratories, prominent global enterprises, and forward-thinking AI-native startups. Notably, OpenAI has entered a multi-year partnership with Cerebras to leverage 750 megawatts of scale, enhancing critical workloads with ultra-high-speed inference.With our groundbreaking wafer-scale architecture, Cerebras Inference delivers the fastest Generative AI inference solution globally, outperforming GPU-based hyperscale cloud inference services by over tenfold. This dramatic increase in speed is transforming how users experience AI applications, facilitating real-time iterations and enhancing intelligence through additional agentic computation.Location: Toronto / SunnyvaleWe are seeking a highly technical, hands-on engineering leader for our Inference Service Platform. In this role, you will guide a high-performing team to address a critical challenge: scaling large language model (LLM) inference on Cerebras’ advanced compute clusters and delivering a world-class, on-premise solution for enterprise customers. You will establish the technical vision while maintaining close engagement with the code, focusing on architecting highly reliable and low-latency distributed systems. If you possess proven expertise in distributed systems and scaling modern model-serving frameworks, we encourage you to apply.

Feb 17, 2026

Apply

Senior Inference Machine Learning Runtime Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI innovation, creating the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our groundbreaking wafer-scale architecture delivers the computational power equivalent to dozens of GPUs on a single chip, combined with the programming simplicity of a unified device. This innovative approach allows us to offer unparalleled training and inference speeds, enabling machine learning practitioners to execute extensive ML applications seamlessly, without the complexities of managing multiple GPUs or TPUs.Cerebras boasts an impressive clientele, including premier model labs, global corporations, and pioneering AI startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aimed at deploying 750 megawatts of scale, revolutionizing critical workloads with ultra-fast inference capabilities.Our unique wafer-scale architecture enables Cerebras Inference to provide the fastest Generative AI inference solution globally, surpassing GPU-based hyperscale cloud inference services by more than tenfold. This remarkable enhancement in speed is reshaping the AI application user experience, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleThe Inference ML Engineering team at Cerebras Systems is committed to empowering our rapid generative inference solution through intuitive APIs, supported by a distributed runtime that operates on extensive clusters of our proprietary hardware. Our goal is to enable enterprises, developers, and researchers to fully harness the capabilities of our platform, leveraging its exceptional performance, scalability, and flexibility. The team collaborates closely with cross-functional groups, including compiler developers, cluster orchestrators, ML scientists, cloud architects, and product teams, to deliver impactful solutions that redefine the limits of ML performance and usability.As a Senior Software Engineer on the Inference ML Engineering team, you will be instrumental in designing and implementing APIs, ML features, and tools that facilitate the execution of state-of-the-art generative AI models on our custom hardware. Your role will involve architecting solutions that allow for seamless model translation and execution, ensuring high throughput and minimal latency while maintaining user-friendliness. You will lead technical initiatives and collaborate with other engineering teams to enhance our solutions.

Feb 17, 2026

Apply

Staff Software Engineer, Inference Cloud

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Role Overview Cerebras Systems is looking for a Staff Software Engineer focused on Inference Cloud. This position is based in Sunnyvale, CA. What You Will Do Design, develop, and optimize software for inference products Work closely with team members to improve performance and reliability Apply advanced AI and machine learning methods to real-world challenges Collaboration Work alongside experienced engineers on projects that shape the future of inference technology at Cerebras Systems.

Apr 14, 2026

Apply

Field Deployment Engineer at Applied Intuition | Sunnyvale, CA

Applied Intuition, Inc.

Full-time|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of shaping the future of physical AI. Established in 2017 and currently valued at $15 billion, this Silicon Valley leader is constructing the digital framework essential for infusing intelligence into every moving machine globally. Serving key sectors including automotive, defense, trucking, construction, mining, and agriculture, Applied Intuition specializes in three pivotal areas: tools and infrastructure, operating systems, and autonomy. Trusted by 18 of the top 20 global automakers, along with the United States military and its allies, our solutions are designed to foster physical intelligence across various industries. Our headquarters is located in Sunnyvale, California, with additional offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.We are an in-office company, expecting employees to work primarily from their Applied Intuition office five days a week. However, we value flexibility and trust our employees to manage their schedules responsibly, which may include occasional remote work, starting the day with meetings from home, or leaving early for family commitments.About the RoleWe are seeking highly motivated engineers who are dedicated to driving exceptional customer success. This role transcends traditional engineering; it offers a unique opportunity to engage at the intersection of innovative technology and significant customer impact. You will be responsible for leading our most strategic customer engagements, spearheading high-stakes initiatives that propel the industry's shift towards safe, software-driven, AI-powered machines on a global scale.Note: This position may require relocation multiple times per year to various customer sites, with a significant focus on Australia for most of 2026.Your Responsibilities at Applied Intuition:Lead the complete field deployment of Applied’s product lines at customer locations, overseeing the integration of systems from initial setup to steady operations.

Mar 10, 2026

Apply

Senior Software Engineer I, Inference

CoreWeave

On-site|On-site|Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Senior Software Engineer I specializing in inference, where you will spearhead architectural designs, elevate engineering standards, and significantly enhance latency, throughput, and reliability across various services. Collaborate closely with product, orchestration, and hardware teams to advance our Kubernetes-native inference platform, ensuring we achieve stringent P99 SLAs at scale.

Feb 10, 2026

Apply

Technical Deployment Lead

CoreWeave

Full-time|$90K/yr - $102K/yr|On-site|Livingston, NJ / New York, NY / Sunnyvale, CA / San Francisco, CA / Bellevue, WA / Richmond, VA

About CoreWeave:CoreWeave is The Essential Cloud for AI™. Designed by pioneers for pioneers, we provide a robust platform that empowers innovators to confidently develop and scale artificial intelligence solutions. Trusted by top AI labs, startups, and global enterprises, our superior infrastructure performance combined with deep technical expertise accelerates breakthroughs and transforms compute into capability. Since our inception in 2017, we have grown significantly and became a publicly traded company (Nasdaq: CRWV) in March 2025. Explore more at www.coreweave.com.Role Overview:As a Technical Deployment Lead on our Tiger Team, you will play a pivotal role in expanding our Data Centers across the U.S. and Canada. This position requires collaboration with a diverse team to ensure the reliability and availability of our hybrid travel data center operations. Your responsibilities will include infrastructure delivery across all CoreWeave data centers, training on-site teams, and conducting hardware and network diagnostics. This position is fully on-site at one of our East Coast or Central region U.S. data centers, with a travel requirement of up to 60% on a rotational basis.Key Responsibilities:Travel nationwide to various data centers for the construction and deployment of new and ongoing sites.Implement and document a global data center standard.Troubleshoot hardware and network issues effectively.Conduct root cause analysis for hardware and software failures.Train internal teams on best practices and procedures.Perform on-site audits following our established QA/QC processes.Provide technical support to global data center teams.Develop tools and scripts for updating server and networking hardware.Maintain testing and tools equipment.Offer follow-up project support as needed.

Mar 17, 2026

Apply

Senior Performance Analyst - Inference at Cerebras Systems | Sunnyvale, CA

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Cerebras Systems is at the forefront of AI innovation, creating the world's largest AI chip that is 56 times larger than traditional GPUs. Our unique wafer-scale architecture delivers the computational power of numerous GPUs on a single chip, simplifying programming while providing unparalleled training and inference speeds. This revolutionary approach enables users to run extensive machine learning applications effortlessly, eliminating the complexity of managing multiple GPUs or TPUs.Cerebras serves a diverse clientele, including leading model labs, major global enterprises, and pioneering AI-native startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of scale that will redefine key workloads with ultra-high-speed inference.Our groundbreaking wafer-scale architecture ensures that Cerebras Inference provides the fastest Generative AI inference solution globally, achieving speeds that are over ten times faster than GPU-based hyperscale cloud services. This significant enhancement in performance is transforming the user experience of AI applications, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleWe are seeking a Senior Performance Analyst to join our dynamic Product team. As a specialist in state-of-the-art inference performance, you will be the go-to expert on how Cerebras measures up against alternative inference providers in terms of pricing and performance. This role combines performance benchmarking from foundational principles with competitive intelligence. The position revolves around two key pillars:Performance BenchmarkingYou will develop, execute, and sustain reproducible benchmarks that assess Cerebras inference performance for actual customer workloads. This includes metrics such as tokens per second, time to first token, latency under concurrency, and total cost of ownership (TCO).Competitive AnalysisYou will analyze market trends and competitor offerings to position Cerebras effectively within the inference landscape.

Apr 13, 2026

Apply

Engineering Manager, AI at Coram AI | Sunnyvale

Coram AI

Full-time|On-site|Sunnyvale

At Coram AI, we are transforming the landscape of video security in the digital age. Our innovative cloud-native platform leverages advanced computer vision and artificial intelligence to empower businesses with enhanced safety, smarter decision-making capabilities, and accelerated operational efficiency through features like real-time alerts, effortless clip sharing, and comprehensive multi-site visibility.Join our dynamic and agile team that prioritizes clarity, craftsmanship, and impactful contributions. Every team member plays a crucial role, delivering significant results and shaping the future of AI-driven security solutions.We are seeking an experienced Engineering Manager to lead our talented AI team at Coram. This team, although small, is exceptionally skilled and operates at the forefront of real-time systems, computer vision, and generative AI.In this hands-on leadership role, you will blend technical guidance, architectural oversight, recruitment, and team management. The ideal candidate will possess up-to-date knowledge of modern deep learning and generative AI, along with substantial experience in building and leading high-performance teams.

Mar 3, 2026

Apply

Staff Frontend Engineer - Inference

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Join Cerebras Systems as a Staff Frontend Engineer specializing in Inference. In this pivotal role, you will be instrumental in developing innovative solutions that push the boundaries of AI and machine learning. Your expertise will drive the design and implementation of user-friendly interfaces that enhance our cutting-edge technology.

Mar 30, 2026

Apply

AI Research Engineer in Robotics

Coram AI

Full-time|On-site|Sunnyvale

At Coram AI, we are revolutionizing video security for the contemporary landscape. Our innovative cloud-native platform leverages advanced computer vision and artificial intelligence to empower businesses to enhance safety, facilitate informed decision-making, and accelerate operations. This includes features such as real-time alerts, effortless clip sharing, and comprehensive visibility across multiple locations.Joining our agile and dynamic team means being part of a collaborative environment that prioritizes clarity, excellence, and impactful contributions. Every team member has a voice, delivers significant work, and plays a crucial role in shaping how AI can foster a safer and more interconnected world.We are seeking engineers who thrive at the nexus of robotics, real-time systems, and deep learning. This position focuses on implementing high-performance vision and multimodal models on robotic platforms, where factors such as latency, reliability, and hardware limitations are paramount.

Mar 11, 2026

Apply

Software Engineer - Robotics at Coram AI | Sunnyvale

Coram AI

Full-time|On-site|Sunnyvale

At Coram AI, we are revolutionizing video security for the contemporary landscape. Our innovative cloud-native platform leverages computer vision and artificial intelligence to empower businesses to enhance safety, make informed decisions, and accelerate operations, featuring real-time alerts, effortless clip sharing, and multi-site visibility.Joining our dynamic and agile team means becoming part of a culture that prioritizes clarity, quality, and impactful contributions. Every team member has a voice, delivers significant work, and plays a crucial role in shaping how AI can foster a safer and more interconnected world.We seek an exceptionally skilled software engineer to develop high-performance, real-time software that operates on edge devices while adhering to stringent latency and memory limitations. This position emphasizes deterministic execution, distributed system architecture, and low-level performance enhancements. You will focus on constructing the infrastructure and runtime systems that enable real-time robotics applications.

Mar 11, 2026

Apply

Engineering Manager - Inference ML Runtime

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Join Cerebras Systems as an Engineering Manager specializing in Inference ML Runtime, where you will lead a dedicated team in developing groundbreaking machine learning solutions. Your expertise will guide the design and implementation of our inference runtime, ensuring efficiency and performance at scale.As a pivotal leader in our innovative environment, you will collaborate with cross-functional teams, driving the development of state-of-the-art algorithms and systems that push the boundaries of artificial intelligence.

Mar 24, 2026

Apply

Software Engineer, Inference AI/ML

CoreWeave

On-site|On-site| Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Software Engineer on our Inference team, where you'll play a vital role in enhancing the performance of our AI model serving platform. As an entry-level engineer, you will implement impactful features that improve latency, reliability, and cost-efficiency on our cutting-edge GPU-based infrastructure. This role offers a unique opportunity for hands-on learning and professional growth through mentorship from seasoned engineers.

Feb 10, 2026

Apply

Senior Staff AI Engineer | Technical Lead in AI Modeling

LinkedIn Corporation

Full-time|On-site|Sunnyvale

Join our dynamic team as a Senior Staff AI Engineer, where you will lead cutting-edge AI modeling initiatives that drive innovation and excellence. In this pivotal role, you will collaborate with cross-functional teams to architect, design, and implement state-of-the-art AI solutions. Your expertise will guide the development of robust algorithms and models that enhance user experiences and optimize performance.

Mar 25, 2026

Apply

Software Engineer - AI Engineering

Applied Intuition

Full-time|On-site|Sunnyvale, California, United States

Join Applied Intuition as a Software Engineer specializing in AI Engineering, where you'll have the opportunity to work on cutting-edge technology and contribute to innovative projects that shape the future of artificial intelligence. As part of our dynamic team, you will collaborate with talented professionals to design, develop, and implement AI solutions that address real-world challenges.

Mar 25, 2026

Apply

Staff AI Engineer - AI Privacy Specialist

LinkedIn Corporation

Full-time|On-site|Sunnyvale

Join our dynamic team as a Staff AI Engineer - AI Privacy Specialist, where you will play a crucial role in advancing our commitment to user privacy and data protection. You will leverage cutting-edge artificial intelligence technologies to create innovative solutions that enhance privacy measures across our platforms.Your expertise in AI and privacy will empower you to collaborate with cross-functional teams, ensuring that our products meet the highest standards of data integrity and user trust.

Mar 25, 2026

Apply

AI Infrastructure Engineer at Meshy | Sunnyvale

Meshy

Full-time|On-site|Sunnyvale

Join Meshy as an AI Infrastructure EngineerLocated in the heart of Silicon Valley, Meshy is a pioneering force in the realm of 3D generative AI. Our mission is to Unleash 3D Creativity, revolutionizing the content creation process. We empower both professional artists and enthusiastic hobbyists to effortlessly craft extraordinary 3D assets, converting text and images into breathtaking 3D models in mere minutes. What used to require weeks of effort and thousands of dollars now takes just 2 minutes and costs only $1.Our elite team comprises leading experts in computer graphics, AI, and artistry, featuring alumni from prestigious institutions such as MIT, Stanford, and Berkeley, alongside seasoned professionals from Nvidia and Microsoft. With a diverse workforce spread across North America, Asia, and Oceania, we cultivate a culture of innovation aimed at solving global 3D challenges. We are backed by top-tier venture capital firms including Sequoia and GGV, having successfully raised $52 Million in funding.Meshy stands as the market leader, acclaimed as the No.1 in popularity among 3D AI tools (according to 2024 A16Z Games) and leading in web traffic (as per SimilarWeb, with 3 Million monthly visits). Our platform supports over 5 Million users and has facilitated the generation of 40 Million models.Our Founder and CEO, Yuanming (Ethan) Hu, earned his Ph.D. in graphics and AI from MIT, where he created the highly regarded Taichi GPU programming language (27K stars on GitHub, utilized by over 300 institutes). His influential work includes an honorable mention for the SIGGRAPH 2022 Outstanding Doctoral Dissertation Award and more than 2,700 research citations.Your RoleThis position merges platform engineering, site reliability, and applied ML systems. You will be responsible for ensuring the reliability, scalability, and operability of Meshy's AI model serving stack and core engineering infrastructure. The team manages a conventional production infrastructure (CI/CD, build systems, deployment, runtime environments) while developing a model-serving platform that links the models created by our Research Team to product-facing backend systems.This role is systems-heavy, focused on production, and dedicated to transforming experimental model artifacts into robust, observable, and cost-efficient services.Key ResponsibilitiesEnsure production reliability: manage availability, latency, error budgets, incident response, postmortems, and follow-ups.Develop and maintain observability frameworks: metrics, logs, traces, and alerting systems.

Feb 11, 2026

Create account — see all 637 results