About the job
Why Choose Nebius?
Nebius is at the forefront of a transformative era in cloud computing, dedicated to empowering the global AI economy. We provide our clients with innovative tools and resources designed to address real-world challenges and revolutionize industries, all while minimizing infrastructure expenses and eliminating the necessity for large in-house AI/ML teams. Our talented workforce operates at the cutting edge of AI cloud infrastructure, collaborating with some of the most skilled and pioneering leaders and engineers in the industry.
Our Work Environment
Based in Amsterdam and publicly traded on Nasdaq, Nebius boasts a global presence with R&D hubs located across Europe, North America, and Israel. Our team comprises over 1,400 professionals, including more than 400 highly specialized engineers with extensive expertise in both hardware and software engineering, complemented by a dedicated in-house AI R&D team.
The Role
We are seeking a Senior HPC Cluster Engineer to become an integral part of our team, contributing to the advancement of our cutting-edge hyperscaler platform. As a member of the GPU & InfiniBand team, you will focus on enhancing and optimizing the core components of our cloud platform, with a specific emphasis on GPU computing, InfiniBand networks, and the KVM/QEMU stack. Your role will involve working closely with hardware virtualization and device emulation technologies to ensure high performance and security in multi-GPU, HPC environments. You will analyze, troubleshoot, and refine our infrastructure to support new hardware, optimize system performance, and automate fault detection and resolution within complex systems.
