About the job
Cerebras Systems is at the forefront of AI technology, developing the largest AI chip in the world, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture delivers the computational power of dozens of GPUs on a single chip, while ensuring programming is as simple as working with a single device. This revolutionary approach enables Cerebras to provide unmatched training and inference speeds, facilitating seamless execution of large-scale machine learning applications without the complexities of managing multiple GPUs or TPUs.
Cerebras proudly serves a diverse clientele, including leading model labs, global corporations, and pioneering AI startups. Recently, OpenAI announced a multi-year collaboration with Cerebras, aiming to harness 750 megawatts of power for transformative workloads through ultra high-speed inference.
Our groundbreaking wafer-scale architecture allows Cerebras Inference to offer the most rapid Generative AI inference solution globally, surpassing GPU-based hyperscale cloud services by over ten times. This significant enhancement in speed is reshaping the user experience for AI applications, enabling real-time iteration and amplifying intelligence through advanced agentic computation.
About The Role
Join us in constructing the next generation of large-scale AI systems designed to handle training and inference workloads with unparalleled efficiency and scale. As a Senior Runtime Engineer, you will be responsible for architecting and developing high-performance distributed software that orchestrates extensive compute and data pipelines across diverse clusters. Your contributions will push the boundaries of concurrency, throughput, and scalability, facilitating the effective execution of models on a massive scale. This position sits at the confluence of systems engineering and machine learning performance, requiring both deep architectural insight and practical low-level implementation capabilities. You will play a crucial role in optimizing how models are executed and fine-tuned from data ingestion through to distributed execution across cutting-edge hardware platforms. We are actively recruiting for runtime roles in both Training and Inference.

