companyCerebras Systems logo

Machine Learning Software Tool Development Engineer

Cerebras SystemsSunnyvale CA or Toronto Canada
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Qualifications:Proven experience in software development, particularly in ML environments. Strong understanding of debugging and validation processes. Proficiency in programming languages relevant to software development and ML. Experience collaborating with multidisciplinary teams. Excellent problem-solving skills and a proactive attitude toward innovative solutions.

About the job

Cerebras Systems is at the forefront of AI technology, creating the world's largest AI chip that is 56 times the size of traditional GPUs. Our innovative wafer-scale architecture combines the compute power of dozens of GPUs into a single chip, simplifying the programming experience. This unique design enables us to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run extensive ML applications seamlessly without the complexities of managing numerous GPUs or TPUs.

Our clientele includes premier model laboratories, multinational corporations, and pioneering AI-driven startups. Notably, OpenAI has recently formed a multi-year collaboration with Cerebras, aiming to harness 750 megawatts of computational scale to revolutionize key workloads through ultra-high-speed inference.

Thanks to our cutting-edge wafer-scale architecture, Cerebras Inference delivers the fastest Generative AI inference solution globally, achieving speeds over ten times faster than GPU-based hyperscale cloud inference services, thus transforming the user experience of AI applications and enabling real-time iterations and enhanced intelligence through additional agentic computation.

Responsibilities:

  • Lead the design and implementation of advanced system-level debugging, validation, and observability platforms.
  • Develop automated systems for collecting and analyzing numerical data and execution anomalies.
  • Create visualization and analysis tools to facilitate efficient root-cause investigations.
  • Build frameworks for failure classification, regression detection, and anomaly monitoring.
  • Enhance compilers, runtimes, and programming interfaces to support sophisticated profiling and instrumentation.
  • Improve workflows related to system bring-up, low-level debugging, and validation.
  • Collaborate cross-functionally with teams in compiler, hardware, firmware, runtime, and infrastructure domains.
  • Establish best practices to ensure debuggability, reliability, and operational excellence.
  • Lead impactful initiatives and support incident response while driving long-term corrective solutions.

About Cerebras Systems

Cerebras Systems is a leading technology company specializing in AI hardware and software solutions. Our mission is to revolutionize the field of artificial intelligence with groundbreaking innovations that enhance processing capabilities and simplify the developer experience.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.