companyCerebras Systems logo

Performance & Reliability Engineer

Cerebras SystemsSunnyvale, CA; Toronto, Ontario, Canada
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

QualificationsProven experience in performance analysis and optimization of machine learning systems. Strong understanding of hardware architecture and software interactions. Familiarity with power and thermal management strategies. Excellent problem-solving skills and the ability to work collaboratively in a fast-paced environment. Strong programming skills in languages such as Python, C++, or similar.

About the job

Cerebras Systems is at the forefront of AI technology, developing the world’s largest AI chip that is 56 times larger than conventional GPUs. Our innovative wafer-scale architecture delivers the computational power of dozens of GPUs within a single chip, simplifying programming and enhancing performance. This unique capability enables Cerebras to provide unparalleled training and inference speeds, allowing machine learning practitioners to execute large-scale ML applications seamlessly without the complexities of managing extensive GPU or TPU infrastructures.

Cerebras serves a diverse clientele, including top-tier model labs, global enterprises, and pioneering AI-native startups. OpenAI has recently partnered with Cerebras to leverage 750 megawatts of power, significantly enhancing key workloads through ultra high-speed inference.

Our cutting-edge wafer-scale architecture has made Cerebras Inference the fastest Generative AI inference solution globally, achieving speeds over ten times faster than GPU-based hyperscale cloud inference services. This revolutionary speed is transforming the user experience of AI applications, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.

About The Role
We invite you to join Cerebras as a Performance & Reliability Engineer within our dynamic Co-Design and Next Generation Team. Our groundbreaking CS-3 system has established benchmarks for high-performance ML training and inference solutions, utilizing a chip the size of a dinner plate with 44GB of on-chip memory that exceeds traditional hardware capabilities. In this role, you will focus on characterizing and optimizing the performance and reliability of state-of-the-art AI models operating on Cerebras' revolutionary hardware.
Responsibilities
  • Characterize and enhance the performance and reliability of advanced ML hardware/software systems, focusing on minimizing power and thermal fluctuations.
  • Analyze ML workloads, software kernels, and hardware architecture for their power and performance impacts, synthesizing high-level insights across these layers.
  • Develop innovative software solutions to enhance system performance and efficiency.

About Cerebras Systems

Cerebras Systems is revolutionizing the AI landscape with its state-of-the-art technology, including the world's largest AI chip. Our innovative solutions empower users to achieve unprecedented performance in machine learning applications.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.