About the job
Cerebras Systems is at the forefront of AI technology, creating the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture combines the computational power of dozens of GPUs into a single chip while simplifying programming to the level of a single device. This unique approach enables Cerebras to offer unmatched training and inference speeds, allowing machine learning practitioners to seamlessly execute large-scale ML applications without the complexity of managing numerous GPUs or TPUs.
Cerebras’ clientele includes prestigious model labs, global enterprises, and pioneering AI-native startups. OpenAI has recently forged a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of scale and revolutionizing critical workloads through ultra-fast inference.
Our groundbreaking wafer-scale architecture powers Cerebras Inference, which provides the fastest Generative AI inference solution available—over 10 times quicker than GPU-based hyperscale cloud inference services. This significant increase in speed is transforming the user experience of AI applications, enabling real-time iteration and enhancing intelligence through additional agentic computation.
About the Role
We are looking for a dynamic and skilled engineer to join our cutting-edge Training Platform team. This team is dedicated to swiftly deploying state-of-the-art open-source models (such as LLaMA, Qwen, etc.) and customer-specific proprietary models on our Cerebras CSX systems. To excel in this position, you should be a system-minded generalist who thrives in fast-paced environments and is comfortable working across the entire Cerebras software stack.
Your contributions will be vital in achieving extraordinary levels of performance, efficiency, and scalability for AI applications.

