Staff Frontend Engineer - Inference

Cerebras SystemsSunnyvale CA or Toronto Canada

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

Proven experience in frontend development, particularly with frameworks such as React or Angular. Strong knowledge of web technologies including HTML, CSS, and JavaScript. Experience with API integration and working with backend services. Ability to collaborate effectively with cross-functional teams and communicate technical concepts clearly. A passion for creating seamless user experiences and optimizing performance.

About the job

Join Cerebras Systems as a Staff Frontend Engineer specializing in Inference. In this pivotal role, you will be instrumental in developing innovative solutions that push the boundaries of AI and machine learning. Your expertise will drive the design and implementation of user-friendly interfaces that enhance our cutting-edge technology.

About Cerebras Systems

Cerebras Systems is at the forefront of AI innovation, dedicated to delivering unparalleled computational power to tackle the most complex challenges in machine learning. Our team is composed of experts who are passionate about pushing the limits of technology, fostering an environment where creativity and collaboration thrive.

Similar jobs

1 - 20 of 525 Jobs

Search for Inference Frontend Engineer

525 results

Select all on this page (20)

Apply

Inference Frontend Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Cerebras Systems is revolutionizing the AI landscape with the world's largest AI chip, which is 56 times more extensive than traditional GPUs. Our innovative wafer-scale architecture enables us to deliver the computational power of dozens of GPUs on a single chip, while offering the ease of programming like a single device. This groundbreaking approach empowers Cerebras to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications effortlessly without the complexities of managing numerous GPUs or TPUs.Cerebras serves a diverse clientele that includes leading model laboratories, global corporations, and pioneering AI-focused startups. Recently, OpenAI announced a multi-year collaboration with Cerebras to harness 750 megawatts of scale, significantly enhancing key workloads through ultra-fast inference capabilities.With our cutting-edge wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution globally, exceeding the speed of GPU-based hyperscale cloud inference services by over ten times. This extraordinary speed transformation is reshaping the user experience of AI applications, facilitating real-time iterations and boosting intelligence through enhanced agentic computation.

Feb 17, 2026

Apply

Staff Frontend Engineer - Inference

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Mar 30, 2026

Apply

Engineering Manager, Inference Platform

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

At Cerebras Systems, we are revolutionizing AI computing by developing the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This unique approach enables us to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications without the complexity of managing multiple GPUs or TPUs.Our esteemed clientele includes leading model laboratories, prominent global enterprises, and forward-thinking AI-native startups. Notably, OpenAI has entered a multi-year partnership with Cerebras to leverage 750 megawatts of scale, enhancing critical workloads with ultra-high-speed inference.With our groundbreaking wafer-scale architecture, Cerebras Inference delivers the fastest Generative AI inference solution globally, outperforming GPU-based hyperscale cloud inference services by over tenfold. This dramatic increase in speed is transforming how users experience AI applications, facilitating real-time iterations and enhancing intelligence through additional agentic computation.Location: Toronto / SunnyvaleWe are seeking a highly technical, hands-on engineering leader for our Inference Service Platform. In this role, you will guide a high-performing team to address a critical challenge: scaling large language model (LLM) inference on Cerebras’ advanced compute clusters and delivering a world-class, on-premise solution for enterprise customers. You will establish the technical vision while maintaining close engagement with the code, focusing on architecting highly reliable and low-latency distributed systems. If you possess proven expertise in distributed systems and scaling modern model-serving frameworks, we encourage you to apply.

Feb 17, 2026

Apply

AI Inference Deployment Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI technology, developing the world's largest AI chip that is 56 times greater than conventional GPUs. Our innovative wafer-scale architecture delivers the computational capabilities of numerous GPUs on a single chip, simplifying programming to the level of a single device. This groundbreaking approach enables Cerebras to achieve unmatched training and inference speeds, allowing machine learning practitioners to seamlessly execute large-scale ML applications without the complexities of managing extensive GPU or TPU resources. Our clientele includes leading model laboratories, global corporations, and pioneering AI-centric startups. Notably, OpenAI has recently entered into a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of capacity, revolutionizing key workloads with exceptionally rapid inference speeds. Thanks to our extraordinary wafer-scale architecture, Cerebras Inference provides the swiftest Generative AI inference solution available today, operating over ten times faster than GPU-based hyperscale cloud inference services. This significant boost in speed is reshaping the user experience in AI applications, facilitating real-time iterations and enhancing intelligence through advanced agentic computation. About The Role We are looking for an exceptionally talented Deployment Engineer to design and manage our state-of-the-art inference clusters. In this role, you will have the opportunity to work with the unparalleled Wafer-Scale Engine (WSE) and the systems that exploit its extraordinary capabilities.

Feb 17, 2026

Apply

Senior Inference Machine Learning Runtime Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI innovation, creating the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our groundbreaking wafer-scale architecture delivers the computational power equivalent to dozens of GPUs on a single chip, combined with the programming simplicity of a unified device. This innovative approach allows us to offer unparalleled training and inference speeds, enabling machine learning practitioners to execute extensive ML applications seamlessly, without the complexities of managing multiple GPUs or TPUs.Cerebras boasts an impressive clientele, including premier model labs, global corporations, and pioneering AI startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aimed at deploying 750 megawatts of scale, revolutionizing critical workloads with ultra-fast inference capabilities.Our unique wafer-scale architecture enables Cerebras Inference to provide the fastest Generative AI inference solution globally, surpassing GPU-based hyperscale cloud inference services by more than tenfold. This remarkable enhancement in speed is reshaping the AI application user experience, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleThe Inference ML Engineering team at Cerebras Systems is committed to empowering our rapid generative inference solution through intuitive APIs, supported by a distributed runtime that operates on extensive clusters of our proprietary hardware. Our goal is to enable enterprises, developers, and researchers to fully harness the capabilities of our platform, leveraging its exceptional performance, scalability, and flexibility. The team collaborates closely with cross-functional groups, including compiler developers, cluster orchestrators, ML scientists, cloud architects, and product teams, to deliver impactful solutions that redefine the limits of ML performance and usability.As a Senior Software Engineer on the Inference ML Engineering team, you will be instrumental in designing and implementing APIs, ML features, and tools that facilitate the execution of state-of-the-art generative AI models on our custom hardware. Your role will involve architecting solutions that allow for seamless model translation and execution, ensuring high throughput and minimal latency while maintaining user-friendliness. You will lead technical initiatives and collaborate with other engineering teams to enhance our solutions.

Feb 17, 2026

Apply

Principal Engineer, AI Inference Reliability

Cerebras Systems

Full-time|Remote|Remote Office; Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI innovation, manufacturing the largest AI chip in the world, which is 56 times bigger than conventional GPUs. Our cutting-edge wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This pioneering approach enables us to offer unmatched training and inference speeds, allowing machine learning practitioners to smoothly execute large-scale ML applications without the complexity of managing numerous GPUs or TPUs. Our clientele includes leading model laboratories, major global corporations, and innovative AI-native startups. Notably, OpenAI has recently partnered with Cerebras to leverage 750 megawatts of scale, revolutionizing critical workloads with ultra-high-speed inference. Our advanced wafer-scale architecture makes Cerebras Inference the fastest Generative AI inference solution available, outperforming GPU-based hyperscale cloud inference services by over tenfold. This remarkable speed enhancement is reshaping the user experience of AI applications, enabling real-time iterations and enhanced intelligence through additional agentic computation.In late 2024, we launched Cerebras Inference, setting a new standard for Generative AI inference speed. Since its launch, we have rapidly scaled our services to meet the rising demand from AI labs, enterprises, and a vibrant developer community.In October 2025, we celebrated our Series G funding round, successfully raising $1.1 billion USD to accelerate the growth of our product offerings and services to satisfy global AI demand.About the TeamThe Cerebras Inference team is dedicated to delivering the most efficient, secure, and reliable enterprise-grade AI service. We design and manage expansive distributed systems that facilitate AI inference with unparalleled speed and efficiency. Join us in scaling our inference capabilities to new heights!

Feb 17, 2026

Apply

Staff Software Engineer, Inference Cloud

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Role Overview Cerebras Systems is looking for a Staff Software Engineer focused on Inference Cloud. This position is based in Sunnyvale, CA. What You Will Do Design, develop, and optimize software for inference products Work closely with team members to improve performance and reliability Apply advanced AI and machine learning methods to real-world challenges Collaboration Work alongside experienced engineers on projects that shape the future of inference technology at Cerebras Systems.

Apr 14, 2026

Apply

Senior Software Engineer I, Inference

CoreWeave

On-site|On-site|Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Senior Software Engineer I specializing in inference, where you will spearhead architectural designs, elevate engineering standards, and significantly enhance latency, throughput, and reliability across various services. Collaborate closely with product, orchestration, and hardware teams to advance our Kubernetes-native inference platform, ensuring we achieve stringent P99 SLAs at scale.

Feb 10, 2026

Apply

Staff Software Engineer - Frontend

alten2

Contract|On-site|Sunnyvale

Join our dynamic team as a Staff Software Engineer specializing in Frontend development. We are seeking a talented individual with a robust background in building scalable e-commerce applications or mobile software. Your expertise in modern JavaScript frameworks and attention to detail will be instrumental in delivering high-quality web applications that enhance user experience.

Aug 17, 2018

Apply

Frontend UI/UX Engineer at cylake-inc | Sunnyvale

Cylake Inc.

Full-time|$150K/yr - $250K/yr|On-site|Sunnyvale

Your ImpactBecome an integral part of a dynamic team dedicated to developing cutting-edge cybersecurity solutions from inception to launch. Under the guidance of industry leaders with a history of success, you will have the chance to design, construct, and roll out innovative products that make a significant difference. This is a perfect opportunity for you to advance your career and enhance your skills alongside a world-class team from the very beginning.Role OverviewIn this pivotal role, you will design user experiences and implement user interfaces for a next-generation security product. This unique position merges both design and implementation of user experience, giving you the chance to utilize modern frontend technologies, explore the integration of AI for optimal user outcomes, and directly influence the success of an exciting new product line.

Mar 5, 2026

Apply

Senior Performance Analyst - Inference at Cerebras Systems | Sunnyvale, CA

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Cerebras Systems is at the forefront of AI innovation, creating the world's largest AI chip that is 56 times larger than traditional GPUs. Our unique wafer-scale architecture delivers the computational power of numerous GPUs on a single chip, simplifying programming while providing unparalleled training and inference speeds. This revolutionary approach enables users to run extensive machine learning applications effortlessly, eliminating the complexity of managing multiple GPUs or TPUs.Cerebras serves a diverse clientele, including leading model labs, major global enterprises, and pioneering AI-native startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of scale that will redefine key workloads with ultra-high-speed inference.Our groundbreaking wafer-scale architecture ensures that Cerebras Inference provides the fastest Generative AI inference solution globally, achieving speeds that are over ten times faster than GPU-based hyperscale cloud services. This significant enhancement in performance is transforming the user experience of AI applications, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleWe are seeking a Senior Performance Analyst to join our dynamic Product team. As a specialist in state-of-the-art inference performance, you will be the go-to expert on how Cerebras measures up against alternative inference providers in terms of pricing and performance. This role combines performance benchmarking from foundational principles with competitive intelligence. The position revolves around two key pillars:Performance BenchmarkingYou will develop, execute, and sustain reproducible benchmarks that assess Cerebras inference performance for actual customer workloads. This includes metrics such as tokens per second, time to first token, latency under concurrency, and total cost of ownership (TCO).Competitive AnalysisYou will analyze market trends and competitor offerings to position Cerebras effectively within the inference landscape.

Apr 13, 2026

Apply

PHP Frontend Developer

Collabera

Full-time|On-site|Sunnyvale

Join our dynamic team at Collabera as a PHP Frontend Developer. We are seeking a skilled professional with a passion for crafting amazing user experiences. You will be responsible for designing and implementing user interfaces that are not only functional but also visually appealing. Your expertise in PHP and frontend technologies will be critical in driving our projects to success.

Nov 3, 2016

Apply

Engineering Manager - Inference ML Runtime

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Join Cerebras Systems as an Engineering Manager specializing in Inference ML Runtime, where you will lead a dedicated team in developing groundbreaking machine learning solutions. Your expertise will guide the design and implementation of our inference runtime, ensuring efficiency and performance at scale.As a pivotal leader in our innovative environment, you will collaborate with cross-functional teams, driving the development of state-of-the-art algorithms and systems that push the boundaries of artificial intelligence.

Mar 24, 2026

Apply

Frontend or Web UI Developer at sonsoftinc | Sunnyvale

Sonsoft Inc.

Full-time|On-site|Sunnyvale

Join Sonsoft Inc. as a talented Frontend or Web UI Developer. In this role, you will design and implement user-friendly interfaces, ensuring a seamless experience for our clients. You will collaborate with cross-functional teams to create innovative web solutions that meet client needs.

Nov 9, 2016

Apply

Security Software Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI technology, developing the largest AI chip in the world, which is 56 times the size of traditional GPUs. Our innovative wafer-scale architecture delivers the AI compute power equivalent to dozens of GPUs on a single chip, while maintaining the simplicity of programming a single device. This unique technology enables Cerebras to provide unparalleled training and inference speeds, allowing machine learning professionals to seamlessly execute large-scale ML applications without the complexity of managing numerous GPUs or TPUs. Our clientele includes leading model laboratories, multinational corporations, and pioneering AI-driven startups. Notably, OpenAI has recently partnered with Cerebras to leverage 750 megawatts of scale, revolutionizing critical workloads with ultra-fast inference capabilities. Thanks to our groundbreaking wafer-scale technology, Cerebras Inference offers the fastest Generative AI inference solution globally, outperforming GPU-based hyperscale cloud inference services by more than tenfold. This significant increase in speed enhances the user experience for AI applications, enabling real-time iterations and amplifying intelligence through enhanced computational capabilities.About The RoleIn this role as a Frontend Engineer on our AI cloud platform, you will be instrumental in developing our customer-facing inference, training, and administrative consoles as well as API experiences. You will design and implement responsive, user-friendly frontend interfaces that ensure an optimal experience for developers, efficiently managing high traffic and throughput.Your expertise in the latest web development frameworks and best practices, along with a strong focus on design and user experience, will be key to our team's success.

Mar 11, 2026

Apply

Senior Software Engineer II, Inference

CoreWeave

On-site|On-site|Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Senior Software Engineer II, where you'll play a pivotal role in shaping the future of AI infrastructure. As an area owner, you'll lead design initiatives and set engineering standards that enhance latency, throughput, and reliability across our advanced services. Collaborate closely with product, orchestration, and hardware teams to elevate our Kubernetes-native inference platform while ensuring we meet stringent P99 SLAs at scale. Your expertise will be integral in implementing cutting-edge optimizations such as micro-batch schedulers and KV-cache reuse, ultimately driving improvements across multiple services.

Feb 10, 2026

Apply

Software Engineer, Inference AI/ML

CoreWeave

On-site|On-site| Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Software Engineer on our Inference team, where you'll play a vital role in enhancing the performance of our AI model serving platform. As an entry-level engineer, you will implement impactful features that improve latency, reliability, and cost-efficiency on our cutting-edge GPU-based infrastructure. This role offers a unique opportunity for hands-on learning and professional growth through mentorship from seasoned engineers.

Feb 10, 2026

Apply

Senior RFIC Design Engineer - Silicon Engineering

SpaceX

Full-time|On-site|Sunnyvale, CA

Join SpaceX as a Senior RFIC Design Engineer in our Silicon Engineering team. In this pivotal role, you will be responsible for designing innovative RF integrated circuits that drive our next-generation space technologies. Collaborate with a team of experts to push the boundaries of technology while ensuring the highest standards of quality and performance.

Apr 1, 2026

Apply

Kernel Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is revolutionizing artificial intelligence with the world's largest AI chip, 56 times larger than traditional GPUs. Our innovative wafer-scale architecture delivers unparalleled AI compute power, equating to dozens of GPUs on a single chip, all while maintaining the programming simplicity of a single device. This unique solution enables Cerebras to achieve unmatched training and inference speeds, allowing machine learning practitioners to seamlessly execute large-scale ML applications without the complexities of managing multiple GPUs or TPUs.We proudly serve a diverse clientele that includes leading model labs, multinational corporations, and pioneering AI-native startups. Notably, OpenAI has recently entered into a multi-year partnership with Cerebras, harnessing 750 megawatts of scale to transform critical workloads with ultra-high-speed inference.Our cutting-edge wafer-scale architecture powers the fastest Generative AI inference solution globally, boasting speeds over ten times faster than GPU-based hyperscale cloud inference services. This remarkable acceleration is reshaping the user experience of AI applications, facilitating real-time iterations and enhancing intelligence through advanced agentic computation.About The RoleAs a Kernel Engineer, you will be pivotal in crafting high-performance software solutions at the convergence of hardware and software. Your primary responsibility will be to implement, optimize, and scale deep learning operations that fully utilize our custom, massively parallel processor architecture.You will collaborate with a world-class team focused on designing, tuning for performance, and validating foundational ML and HPC kernels. This role includes building a comprehensive library of parallel and distributed algorithms aimed at maximizing compute utilization and enhancing training efficiency for state-of-the-art AI models. Your contributions will be crucial in unlocking the full capabilities of our hardware and accelerating the advancements in AI.

Feb 23, 2026

Apply

Senior RTL Design Engineer in Silicon Engineering

SpaceX

Full-time|$170K/yr - $235K/yr|On-site|Sunnyvale, CA

Founded on the vision of making humanity a multi-planetary species, SpaceX is pioneering the technologies to enable human life on Mars. Our mission extends beyond the stars; we are also transforming global connectivity through Starlink, the most advanced broadband internet system in the world.SENIOR RTL DESIGN ENGINEER (SILICON ENGINEERING)At SpaceX, we leverage our extensive experience in rocket and spacecraft development to successfully deploy Starlink, the world's largest satellite constellation. This initiative is providing high-speed, reliable internet to millions across the globe. We design, build, test, and operate all components of the system, from thousands of satellites to consumer receivers that enable users to connect with ease, and the software that integrates it all. As we continue to expand Starlink's global reach, we seek exceptional engineers to enhance its utility for communities and businesses worldwide.We are in search of a proactive and intellectually curious Senior RTL Design Engineer to collaborate with our elite, cross-disciplinary teams, including systems, firmware, architecture, design, validation, product engineering, and ASIC implementation. In this role, you will be at the forefront of developing next-generation FPGAs and ASICs that will be deployed in both space and terrestrial infrastructures. Your contributions will facilitate connectivity in areas that have previously lacked affordable and reliable access, thereby enhancing the capabilities of the Starlink network.

Mar 10, 2026

Create account — see all 525 results