Performance & Reliability Engineer

Cerebras SystemsSunnyvale, CA; Toronto, Ontario, Canada

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

QualificationsProven experience in performance analysis and optimization of machine learning systems. Strong understanding of hardware architecture and software interactions. Familiarity with power and thermal management strategies. Excellent problem-solving skills and the ability to work collaboratively in a fast-paced environment. Strong programming skills in languages such as Python, C++, or similar.

About the job

Cerebras serves a diverse clientele, including top-tier model labs, global enterprises, and pioneering AI-native startups. OpenAI has recently partnered with Cerebras to leverage 750 megawatts of power, significantly enhancing key workloads through ultra high-speed inference.

Our cutting-edge wafer-scale architecture has made Cerebras Inference the fastest Generative AI inference solution globally, achieving speeds over ten times faster than GPU-based hyperscale cloud inference services. This revolutionary speed is transforming the user experience of AI applications, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.

About The Role

We invite you to join Cerebras as a Performance & Reliability Engineer within our dynamic Co-Design and Next Generation Team. Our groundbreaking CS-3 system has established benchmarks for high-performance ML training and inference solutions, utilizing a chip the size of a dinner plate with 44GB of on-chip memory that exceeds traditional hardware capabilities. In this role, you will focus on characterizing and optimizing the performance and reliability of state-of-the-art AI models operating on Cerebras' revolutionary hardware.

Responsibilities

Characterize and enhance the performance and reliability of advanced ML hardware/software systems, focusing on minimizing power and thermal fluctuations.
Analyze ML workloads, software kernels, and hardware architecture for their power and performance impacts, synthesizing high-level insights across these layers.
Develop innovative software solutions to enhance system performance and efficiency.

About Cerebras Systems

Cerebras Systems is revolutionizing the AI landscape with its state-of-the-art technology, including the world's largest AI chip. Our innovative solutions empower users to achieve unprecedented performance in machine learning applications.

Similar jobs

1 - 20 of 616 Jobs

Search for High Performance Computing Software Engineer Supercomputing

616 results

Select all on this page (20)

Apply

High Performance Computing Software Engineer - Supercomputing

Institute of Foundation Models

Full-time|On-site|Sunnyvale, CA

Join Our Innovative Team at the Institute of Foundation ModelsAt IFM, we are pioneers in developing, understanding, and managing foundation models. Our mission is to advance research, cultivate the next generation of AI innovators, and contribute significantly to a knowledge-driven economy. As a member of our esteemed team, you will engage in the forefront of cutting-edge foundation model training, collaborating with top-tier researchers, data scientists, and engineers. Together, we will address the most significant and impactful challenges in AI development. You will play a crucial role in creating revolutionary AI solutions that have the potential to transform entire industries. Your strategic and innovative problem-solving abilities will be essential in establishing MBZUAI as a global leader in high-performance computing for deep learning, facilitating discoveries that will inspire future AI pioneers. The Role IFM is developing the foundational compute infrastructure that will drive future breakthroughs in AI and computational science. We are seeking a High Performance Computing Software Engineer to collaborate in designing, developing, and operating the software systems that manage our extensive AI workloads. In this position, you will work at the crossroads of high-performance computing and machine learning. You will be part of a dedicated team focused on creating the software stack that supports the training of advanced ML models using over 1000 GPUs, while ensuring our infrastructure remains robust, efficient, and user-friendly.

Apr 3, 2026

Apply

Senior Network Engineer - Supercomputing

ifm-us

Full-time|On-site|Sunnyvale, CA

Join the Institute of Foundation ModelsAs a leading research laboratory, we are devoted to building, understanding, utilizing, and managing foundation models effectively. Our mission is to propel research forward, cultivate the future generation of AI innovators, and contribute significantly to a knowledge-driven economy.In this role, you will engage with cutting-edge foundation model training, collaborating with top-tier researchers, data scientists, and engineers to address the most crucial and impactful challenges in AI development. You will play a pivotal role in crafting revolutionary AI solutions capable of transforming entire industries. Your strategic and innovative problem-solving abilities will be vital in establishing MBZUAI as a global leader in high-performance computing for deep learning, fostering groundbreaking discoveries that will inspire the next wave of AI pioneers.Position OverviewAs a member of IFM’s Supercomputing team, you will be tasked with designing, optimizing, and maintaining high-performance, low-latency networking solutions that support some of the world’s largest GPU supercomputing clusters. You will work on both network software and systems that facilitate AI training and inference processes, utilizing state-of-the-art technologies such as NVIDIA’s RDMA-capable solutions, InfiniBand, RoCE, and GPUDirect RDMA. Our comprehensive product stack encompasses the entire lifecycle of network management—from metric gathering and configuration deployment to zero-touch provisioning, real-time monitoring, alerting, and auto-remediation. Additionally, you will be responsible for troubleshooting, diagnosing, and swiftly resolving any network-related issues in collaboration with cross-functional teams, ensuring optimal reliability and performance.

May 15, 2025

Apply

Software Engineer - System Performance for Robot Software

Wayve

Full-time|On-site|Sunnyvale

Join Wayve, a pioneering company at the forefront of robotic software development, as a Software Engineer specializing in System Performance. In this role, you will be instrumental in optimizing our advanced robotic systems to enhance their efficiency and reliability. Collaborate with a talented team to push the boundaries of what is possible in the field of robotics.

Mar 30, 2026

Apply

Senior Distributed Systems Engineer

Institute of Foundation Models

Full-time|On-site|Sunnyvale, CA

About the Institute of Foundation ModelsThe Institute of Foundation Models (IFM) specializes in designing and operating large-scale GPU supercomputing systems aimed at training cutting-edge foundation models. Our philosophy places emphasis on the interdependence of performance, fault tolerance, and scalability across various components, including model architecture, communication systems, runtime, and hardware topology.This position is pivotal to our mission — enhancing communication performance, distributed reliability, and cross-layer optimization for extensive training workloads.The MissionWe seek a highly skilled engineer to collaboratively design and optimize the communication stack for large-scale distributed training, with a focus on hybrid parallelism and Mixture-of-Experts (MoE) workloads. This is a systems-level engineering role centered on performance enhancement, distributed debugging, and communication-runtime co-design.· Design and optimize expert-parallel and hybrid-parallel communication patterns· Drive high-performance hierarchical collectives for MoE workloads· Co-design runtime orchestration with communication topology awareness· Mitigate tail latency and enhance determinism across thousands of GPUs· Architect fault-tolerant distributed execution that withstands real-world cluster failuresCore Technical Scope· Communication-compute overlap and topology-aware collective optimization· In-depth debugging of NCCL, RDMA, and custom communication layers· Implementing hybrid expert parallel strategies in modern large-scale MoE systems· Developing elastic and resilient distributed job orchestration concepts· Conducting congestion analysis and routing optimization across InfiniBand/RoCE fabrics· Executing microbenchmarking and performance modeling for communication-intensive workloadsExpected Technical Depth· Expertise in hybrid expert parallel communication strategies

Mar 3, 2026

Apply

Sr GPU Performance Software Engineer II

CoreWeave

On-site|On-site|Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Senior GPU Performance Software Engineer II, where you will take the lead in transforming our GPU performance testing platform. As an influential member of our engineering team, you'll design and implement scalable solutions that enhance the reliability and performance of our global infrastructure. Collaborate with cross-functional teams to deliver measurable improvements in latency and throughput, ensuring an exceptional experience for our customers.

Feb 10, 2026

Apply

R&D Engineer - Advanced Technology in AI/ML and HPC

Cerebras Systems

Full-time|On-site|Sunnyvale, CA; Toronto, Ontario, Canada; Vancouver, British Columbia, Canada

Join Cerebras Systems as an R&D Engineer specializing in Advanced Technology, focusing on Artificial Intelligence (AI) and Machine Learning (ML) within High-Performance Computing (HPC). In this pivotal role, you will contribute to cutting-edge projects that drive innovation in AI and ML technologies.As part of our dynamic team, you will collaborate with top-tier engineers and researchers to develop revolutionary solutions that enhance computing capabilities. Your expertise will be instrumental in shaping the future of AI and HPC technologies.

Apr 7, 2026

Apply

Performance & Reliability Engineer

Cerebras Systems

Full-time|On-site|Sunnyvale, CA; Toronto, Ontario, Canada

Cerebras Systems is at the forefront of AI technology, developing the world’s largest AI chip that is 56 times larger than conventional GPUs. Our innovative wafer-scale architecture delivers the computational power of dozens of GPUs within a single chip, simplifying programming and enhancing performance. This unique capability enables Cerebras to provide unparalleled training and inference speeds, allowing machine learning practitioners to execute large-scale ML applications seamlessly without the complexities of managing extensive GPU or TPU infrastructures.Cerebras serves a diverse clientele, including top-tier model labs, global enterprises, and pioneering AI-native startups. OpenAI has recently partnered with Cerebras to leverage 750 megawatts of power, significantly enhancing key workloads through ultra high-speed inference.Our cutting-edge wafer-scale architecture has made Cerebras Inference the fastest Generative AI inference solution globally, achieving speeds over ten times faster than GPU-based hyperscale cloud inference services. This revolutionary speed is transforming the user experience of AI applications, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleWe invite you to join Cerebras as a Performance & Reliability Engineer within our dynamic Co-Design and Next Generation Team. Our groundbreaking CS-3 system has established benchmarks for high-performance ML training and inference solutions, utilizing a chip the size of a dinner plate with 44GB of on-chip memory that exceeds traditional hardware capabilities. In this role, you will focus on characterizing and optimizing the performance and reliability of state-of-the-art AI models operating on Cerebras' revolutionary hardware.ResponsibilitiesCharacterize and enhance the performance and reliability of advanced ML hardware/software systems, focusing on minimizing power and thermal fluctuations.Analyze ML workloads, software kernels, and hardware architecture for their power and performance impacts, synthesizing high-level insights across these layers.Develop innovative software solutions to enhance system performance and efficiency.

Feb 17, 2026

Apply

Sr. Software Engineer - Perf and Benchmarking

CoreWeave

On-site|On-site| Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Senior Engineer on our Benchmarking & Performance team, where you will play a vital role in our expansive performance data warehouse. You will be responsible for ingesting, storing, transforming, and analyzing performance events across our global infrastructure. Your work will contribute to publishing industry-leading end-to-end performance benchmarks like MLPerf. As an owner of your projects, you will drive designs, elevate engineering standards, and deliver tangible enhancements in latency, throughput, and reliability across numerous services. Collaborating with product, orchestration, and hardware teams, you will help evolve our Kubernetes-native platform to meet stringent P99 SLAs at scale.

Feb 10, 2026

Apply

Senior Software Engineer - Source Control and Governance

CoreWeave

Full-time|On-site|Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA

CoreWeave is hiring a Senior Software Engineer focused on Source Control and Governance. This position is based in Livingston, NJ, New York, NY, Sunnyvale, CA, or Bellevue, WA. Role overview This role centers on leading improvements in software development processes, with a strong emphasis on source control and governance. The Senior Software Engineer will work closely with teams across the company to strengthen codebase integrity and security. What you will do Lead initiatives to integrate advanced source control systems and governance frameworks. Collaborate with engineers and other departments to implement effective version control practices. Promote best practices that support secure and reliable code management. Impact Your work will help streamline development workflows and support innovation within CoreWeave's engineering teams.

Apr 29, 2026

Apply

Software Engineer - Robotics at Coram AI | Sunnyvale

Coram AI

Full-time|On-site|Sunnyvale

At Coram AI, we are revolutionizing video security for the contemporary landscape. Our innovative cloud-native platform leverages computer vision and artificial intelligence to empower businesses to enhance safety, make informed decisions, and accelerate operations, featuring real-time alerts, effortless clip sharing, and multi-site visibility.Joining our dynamic and agile team means becoming part of a culture that prioritizes clarity, quality, and impactful contributions. Every team member has a voice, delivers significant work, and plays a crucial role in shaping how AI can foster a safer and more interconnected world.We seek an exceptionally skilled software engineer to develop high-performance, real-time software that operates on edge devices while adhering to stringent latency and memory limitations. This position emphasizes deterministic execution, distributed system architecture, and low-level performance enhancements. You will focus on constructing the infrastructure and runtime systems that enable real-time robotics applications.

Mar 11, 2026

Apply

Staff Software Engineer, Inference Cloud

Cerebras Systems

Full-time|On-site|Sunnyvale, CA

Role Overview Cerebras Systems is looking for a Staff Software Engineer focused on Inference Cloud. This position is based in Sunnyvale, CA. What You Will Do Design, develop, and optimize software for inference products Work closely with team members to improve performance and reliability Apply advanced AI and machine learning methods to real-world challenges Collaboration Work alongside experienced engineers on projects that shape the future of inference technology at Cerebras Systems.

Apr 14, 2026

Apply

Staff ML Performance Engineer - Training Efficiency

Wayve Technologies

Full-time|On-site|Sunnyvale

Join Wayve Technologies as a Staff Machine Learning Performance Engineer, specializing in Training Efficiency. In this pivotal role, you will be responsible for enhancing the performance of our machine learning models and algorithms, ensuring they operate at peak efficiency. You will collaborate with cross-functional teams to develop innovative solutions that improve training processes, optimize model performance, and drive impactful results in autonomous vehicle technology.

Feb 27, 2026

Apply

Machine Learning Engineer - Robotics and Computer Vision

Intuitive Surgical, Inc.

Full-time|On-site|Sunnyvale

Join our innovative team at Intuitive Surgical, Inc., where we are redefining healthcare through advanced robotics and computer vision. As a Machine Learning Engineer, you will be at the forefront of developing cutting-edge algorithms that enhance robotic performance and improve patient outcomes. This role offers an exciting opportunity to work on transformative technology that makes a real impact in the medical field.

Feb 11, 2026

Apply

Senior Software Engineer

Ceribell

Full-time|$141K/yr - $190K/yr|On-site|Sunnyvale, CA

About CeribellCeribell is at the forefront of medical technology, dedicated to revolutionizing the diagnosis and management of patients with serious neurological conditions. Our innovative Ceribell System is a cutting-edge, point-of-care electroencephalography (EEG) platform that meets the critical needs of patients in acute care settings. Already in use at hundreds of community hospitals, large academic institutions, and major integrated delivery networks across the nation, our team shares a collective mission to enhance critical care with our rapid seizure detection technology. Join us in making a difference!Position Overview:We are seeking a talented Senior Software Engineer with a strong backend focus to join our dynamic team in developing the next generation of EEG web applications that cater to vital medical use cases. In this role, you will be instrumental in designing, maintaining, and enhancing the backend systems for our EEG Portal web application, which is essential for healthcare providers, researchers, and clinical teams to access, monitor, and analyze EEG data. You will collaborate closely with fellow engineers, product managers, and stakeholders to ensure that our backend systems are robust, secure, and scalable within a medical environment.Key Responsibilities:Backend Development & Maintenance:Design, develop, and maintain backend systems to support the EEG Portal application, ensuring dependable performance and adherence to healthcare standards.Implement new features and enhancements to meet clinical and research demands, prioritizing efficiency and scalability.Troubleshoot, debug, and optimize backend systems to guarantee maximum uptime and reliability for users.Database Management:Write optimized database queries and execute data migration strategies.Monitor and fine-tune database performance, including indexing, replication, and backup processes.API Development & Integration:Develop and maintain RESTful APIs that interact with the frontend and other systems.Ensure APIs are secure, well-documented, and capable of handling large volumes of sensitive medical data.Integrate third-party services and platforms as needed to enhance functionality.Ensure backend services comply with regulatory standards, including data encryption, authentication, and auditing.

Mar 2, 2026

Apply

Application Software Engineer - Embedded Software Integration

Wayve

Full-time|On-site|Sunnyvale

Join Wayve as a skilled Application Software Engineer specializing in software integration and embedded systems. In this role, you will be integral in developing innovative software solutions that empower autonomous vehicles. You will work closely with cross-functional teams to design, implement, and optimize software that facilitates seamless integration of complex systems.

Mar 25, 2026

Apply

Software Engineer - Sensor Systems and Robotics Software

Wayve

Full-time|On-site|Sunnyvale

At Wayve, we are dedicated to fostering a diverse, equitable, and inclusive culture that values each individual's unique skills and perspectives, irrespective of their sex, race, religion, ethnic or national origin, disability, age, citizenship, marital status, sexual orientation, gender identity, veteran status, pregnancy, or any other legally protected status.About UsFounded in 2017, Wayve is at the forefront of developing Embodied AI technology. Our cutting-edge AI software and foundation models empower vehicles to perceive, comprehend, and navigate complex environments, enhancing the safety and usability of automated driving systems.Our vision is to create autonomous solutions that move the world forward. Our intelligent, mapless, and hardware-agnostic AI products are tailored for automakers, driving the transition from assisted to fully automated driving. In our dynamic environment, we tackle significant challenges with enthusiasm, embracing complexity to unlock innovative solutions. We strive for excellence while remaining humble, consistently learning and evolving towards a smarter and safer future.At Wayve, your contributions are essential. We cherish diversity, welcome new insights, and promote an inclusive work atmosphere where we support one another to make a meaningful impact.Join Wayve and embark on a career-defining journey!The RoleAs a Software Engineer, you will engage with Wayve’s next-generation compute and sensor platform and contribute to all aspects of the software development lifecycle.As a key member of the Robot Software team, you will collaborate to develop software for edge devices that provide critical data and facilitate autonomy across a large fleet of vehicles. You will have a crucial role in ensuring that the software you develop operates reliably at scale, while also working closely with our Embodied AI and Science teams to provide them with the necessary data and interfaces for model training, experimentation, and performance feedback.

Feb 17, 2026

Apply

Senior Software Engineer in Platform Engineering

Intuitive Surgical, Inc.

Full-time|On-site|Sunnyvale

Intuitive Surgical, Inc. seeks a Senior Software Engineer to join the Platform Engineering team in Sunnyvale. This role centers on developing and maintaining the foundational software that powers advanced surgical technologies. Key responsibilities Design and build core platform software for surgical systems Collaborate with other engineering teams to create reliable and scalable solutions Drive ongoing enhancements that support improvements in surgical procedures and patient care Role focus This position emphasizes both architecture and hands-on development for the software platform. Work will directly impact the reliability and capabilities of surgical technologies used in healthcare settings.

Apr 24, 2026

Apply

Software Engineer

Mindlance

Full-time|On-site|Sunnyvale

Join our innovative team at Mindlance as a Software Engineer. In this role, you will be instrumental in developing cutting-edge software solutions that enhance user experience and streamline processes. Collaborate with a talented team of engineers and contribute to various stages of software development, from concept to deployment.Your role will involve coding, testing, and debugging applications while ensuring optimal performance and responsiveness. You will have the opportunity to work with the latest technologies and tools in a dynamic environment that fosters growth and creativity.

Apr 28, 2015

Apply

Manufacturing Software Engineer

Intuitive Surgical

Full-time|On-site|Sunnyvale

Primary Function of Position:Become part of Intuitive Surgical, a pioneering team committed to leveraging advanced technology to enhance patient outcomes through improved surgical precision and reduced invasiveness, with patient safety as our foremost concern.As a member of the Automation, Equipment and Test (AET) Team, you will contribute to creating the robotics that manufacture robotics. Your role will involve the design, development, and maintenance of equipment, fixtures, and tooling that optimize and enhance the manufacturing processes of surgical instruments and accessories.This position is pivotal in advancing the design and production of new surgical robotic systems and related instruments. You will develop software and algorithms for custom semi-automated electro-mechanical systems, ensuring product performance, reliability, and safety. Close collaboration with product development teams, systems analysts, electrical and mechanical engineers, manufacturing engineers, and quality engineers will be essential to establish a coherent diagnostic strategy and implement effective software solutions.Key Responsibilities:Design, develop, and implement software solutions for manufacturing equipment that constructs and tests medical devices, including robotic systems and accessories.Construct and sustain software infrastructures that facilitate value extraction from generated data.Analyze and refine manufacturing processes to boost efficiency, lower costs, and elevate productivity.Comprehend product operations and controls, and create methods to ensure their integrity during high-volume production.Document, direct, and execute IQOQPQ and DQ validation activities on manufacturing equipment.Establish, document, and adhere to best practices in software development.Independently navigate challenges with minimal supervision.Assume ownership of manufacturing software and collaborate with cross-functional teams to drive projects to completion.Support and upgrade existing production software.

Feb 24, 2026

Apply

Robotics Software Engineer

Intuitive Surgical, Inc.

Full-time|On-site|Sunnyvale

Join our team as a Robotics Software Engineer at Intuitive Surgical, where we are dedicated to transforming minimally invasive care through innovative technologies. You will be instrumental in designing, developing, and testing cutting-edge software solutions that enhance robotic systems. Collaborate with multidisciplinary teams to ensure the seamless integration of software with hardware while maintaining the highest standards of quality.

Mar 10, 2026

Create account — see all 616 results