About the job
About Us
Graphcore stands at the forefront of innovation in Artificial Intelligence computing, dedicated to transforming the landscape of AI technology.
We are pioneering the development of hardware, software, and systems infrastructure that will catalyze the next wave of AI breakthroughs, facilitating the widespread adoption of AI solutions across diverse industries.
As a proud member of the SoftBank Group, Graphcore joins a distinguished family of companies that are driving some of the world’s most revolutionary technologies. Together, we share an ambitious vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to all.
Our team is a rich tapestry of talents and perspectives, comprising AI research specialists, silicon designers, software engineers, and systems architects, all thriving in a culture focused on continuous learning and innovation.
Job Summary
We invite applications for the position of Graduate Software Engineer to join our dynamic team, which is at the cutting edge of developing high-performance machine learning (ML) kernels tailored for next-generation AI hardware.
In this role, you will be instrumental in designing optimized compute kernels that facilitate a broad spectrum of ML operators, enabling applications ranging from convolutional neural networks (CNNs) to large language models (LLMs).
Your work will involve leveraging low-level programming and hardware-aware optimization techniques to maximize performance and efficiency on modern accelerators. This position offers a unique chance to operate at the nexus of ML, numerical computing, and scalable systems.
The Team
Join our expanding Kernel Engineering team, which is dedicated to delivering a high-performance compute library that empowers customers to extract maximum performance from their AI hardware.
Responsibilities and Duties
- Assist in the design and implementation of kernels for linear algebra and tensor operations (e.g., GEMM, batched GEMM, convolutions, reductions, elementwise, and fused operations) using C++.
- Profile and optimize for next-generation AI hardware, focusing on threading, cache locality, memory layout, and kernel launch efficiency.
- Support performance and correctness by implementing microbenchmarks, regression tests, and numerical validation.
- Debug issues, resolve bugs, and enhance the quality and functionality of the product.
About You
You are collaborative and open-minded, with a keen interest in performance optimization and memory-efficient designs, eager to join a team that values innovation and excellence.

