About the job
About Us
Graphcore stands at the forefront of innovation in Artificial Intelligence computing, dedicated to developing advanced hardware, software, and systems infrastructure. Our mission is to facilitate the next wave of AI breakthroughs, propelling the widespread implementation of AI solutions across diverse industries.
As a proud member of the SoftBank Group, Graphcore joins an elite consortium of companies that are shaping transformative technologies. Together, we envision a future where Artificial Super Intelligence is accessible and beneficial for all.
Our teams are composed of talented individuals from various backgrounds, fostering a rich blend of skills and viewpoints. With a culture that thrives on continuous learning and innovation, we are home to AI research experts, silicon designers, software engineers, and systems architects.
Position Overview
We are on the lookout for a Staff Systems Engineer to provide expert operational, diagnostic, and engineering support for Graphcore’s Arm-based hardware platforms in both lab and data center settings.
This role emphasizes the support of hardware bring-up, validation, and troubleshooting of intricate AI compute platforms, such as server blades, racks, and rack-scale infrastructure. The ideal candidate will work in close collaboration with engineering, platform, and data center teams to guarantee the reliability and performance of our next-generation AI systems.
The Team
The Systems Engineering and Hardware Engineering teams play a pivotal role in enabling the bring-up, validation, and operational reliability of Graphcore’s AI infrastructure platforms.
This collaborative environment promotes rapid problem-solving and continuous enhancement of Graphcore’s hardware platforms from early development stages to production deployment.
Key Responsibilities
- Lead advanced troubleshooting for server blades, motherboards, power systems, and rack-scale infrastructure.
- Assist in engineering bring-up activities, including component validation and firmware interaction testing.
- Diagnose system-level failures related to thermal performance, power irregularities, network configurations, and BIOS/BMC issues.
- Collaborate with server engineering teams to conduct root cause analyses and recommend corrective actions or design improvements.
- Facilitate the deployment and rollout of next-generation hardware platforms through structured processes.

