About the job
Cerebras Systems is pioneering the future of artificial intelligence with the development of the world's largest AI chip, which is an astonishing 56 times bigger than conventional GPUs. Our innovative wafer-scale architecture enables the AI computational power equivalent to dozens of GPUs on a single chip, while maintaining programming simplicity akin to that of a single device. This state-of-the-art approach allows us to deliver unparalleled training and inference speeds, enabling machine learning practitioners to seamlessly execute large-scale ML applications without the complexities of managing multiple GPUs or TPUs.
We proudly serve a diverse clientele, including leading model labs, multinational corporations, and innovative AI-native startups. Notably, OpenAI recently announced a multi-year partnership with Cerebras to harness 750 megawatts of scale, revolutionizing critical workloads with ultra-high-speed inference.
With our groundbreaking wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution globally, achieving speeds over ten times faster than GPU-based hyperscale cloud inference services. This remarkable enhancement in speed is transforming the user experience for AI applications, unlocking real-time iteration and enriching intelligence through enhanced computational capabilities.
About The Role
As a Network Architect on the Cluster Architecture Team, you will collaborate closely with vendors, internal networking teams, and industry experts to create top-tier interconnect architecture for both current and future generations of Cerebras AI clusters. Your responsibilities will include developing proof-of-concept designs for new network features that promote a resilient and reliable network tailored for AI workloads. This role demands cross-functional collaboration and engagement with a variety of hardware components, including network devices and the Wafer-Scale Engine, as well as software across multiple layers of the stack, from host-side networking to cluster-level coordination. A strong understanding of network monitoring systems and debugging methodologies is essential.
Responsibilities
- Design AI/ML and HPC Clusters.
- Identify and mitigate performance or efficiency bottlenecks, ensuring optimal resource utilization, low latency, and high-throughput communication.
- Lead technical projects involving multiple teams and diverse software and hardware components to realize advanced network solutions.

