companyCerebras Systems logo

Engineering Manager, Inference Platform

Cerebras SystemsSunnyvale CA or Toronto Canada
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Manager

Qualifications

Key Responsibilities:Lead the engineering team in developing and scaling LLM inference solutions. Architect low-latency distributed systems to support enterprise-grade applications. Set the technical direction while remaining engaged with hands-on coding. Collaborate with cross-functional teams to deliver exceptional products. Required Qualifications:Proven experience in distributed systems and modern model-serving frameworks. Strong programming skills and architectural knowledge. Excellent leadership and communication abilities.

About the job

At Cerebras Systems, we are revolutionizing AI computing by developing the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This unique approach enables us to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications without the complexity of managing multiple GPUs or TPUs.

Our esteemed clientele includes leading model laboratories, prominent global enterprises, and forward-thinking AI-native startups. Notably, OpenAI has entered a multi-year partnership with Cerebras to leverage 750 megawatts of scale, enhancing critical workloads with ultra-high-speed inference.

With our groundbreaking wafer-scale architecture, Cerebras Inference delivers the fastest Generative AI inference solution globally, outperforming GPU-based hyperscale cloud inference services by over tenfold. This dramatic increase in speed is transforming how users experience AI applications, facilitating real-time iterations and enhancing intelligence through additional agentic computation.

Location: Toronto / Sunnyvale

We are seeking a highly technical, hands-on engineering leader for our Inference Service Platform. In this role, you will guide a high-performing team to address a critical challenge: scaling large language model (LLM) inference on Cerebras’ advanced compute clusters and delivering a world-class, on-premise solution for enterprise customers. You will establish the technical vision while maintaining close engagement with the code, focusing on architecting highly reliable and low-latency distributed systems. If you possess proven expertise in distributed systems and scaling modern model-serving frameworks, we encourage you to apply.

About Cerebras Systems

Cerebras Systems is at the forefront of AI innovation, creating the largest AI chip in the world. Our technology is designed to simplify the complexities of machine learning, enabling businesses to harness the full potential of AI without the burden of managing extensive computing resources. With a growing list of partnerships and clients, we are committed to transforming the AI landscape.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.