About the job
Cerebras Systems is at the forefront of AI technology, creating the largest AI chip in the world, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture provides AI compute power equivalent to dozens of GPUs on a single chip, while ensuring the programming simplicity of a single device. This unique approach enables Cerebras to achieve industry-leading training and inference speeds, empowering machine learning practitioners to run extensive ML applications without the complexities of managing multiple GPUs or TPUs.
Our clientele includes leading model labs, global corporations, and pioneering AI-native startups. Recently, OpenAI announced a multi-year collaboration with Cerebras to utilize 750 megawatts of scale, revolutionizing important workloads with ultra-high-speed inference.
With our groundbreaking wafer-scale architecture, Cerebras Inference delivers the fastest Generative AI inference solution globally, outperforming GPU-based hyperscale cloud inference services by over tenfold. This remarkable speed enhancement is reshaping the user experience of AI applications, facilitating real-time iterations and amplifying intelligence through enhanced agentic computation.
About The Role
As a Compute / Server Platform Architect within the Cluster Architecture Team, you will be responsible for the server-side platform architecture that empowers Cerebras CS3-based AI clusters (for both training and inference), ensuring predictable performance, scalability, and reliability. Our accelerators are network-attached, making the x86 server fleet an integral component of the end-to-end system. This system supports critical runtime functions such as orchestration, prompt caching, and IO/control services, necessitating co-design with software to optimize token-level latency, throughput, and cost efficiency. You will translate workload behaviors into requirements for CPU, memory, IO, PCIe, and host networking, lead platform evaluations with vendors, and provide technical direction through qualification and production adoption in close collaboration with other leaders and technical project managers.

