About the job
Why Join Nebius?
At Nebius, we are pioneering a transformative era in cloud computing tailored for the global AI economy. Our mission is to equip our clients with innovative tools and resources that address real-world challenges and revolutionize industries, all while minimizing infrastructure costs and eliminating the necessity for extensive in-house AI/ML teams. Our workforce operates at the forefront of AI cloud infrastructure, collaborating with some of the most skilled and visionary leaders and engineers in the industry.
Our Work Environment
Headquartered in Amsterdam and publicly traded on Nasdaq, Nebius boasts a global presence with R&D hubs located throughout Europe, North America, and Israel. Our diverse team of over 1,400 professionals includes more than 400 expert engineers specializing in both hardware and software engineering, complemented by a dedicated in-house AI R&D team.
About the Role
We are in search of a dedicated Hardware Quality Assurance Engineer with a focus on servers and server components. This role is critical in ensuring quality, performance, reliability, and compatibility for modern server platforms. You will collaborate closely with hardware, firmware, system, services, and application teams to validate server architectures and components throughout the entire product lifecycle, from early prototypes to production deployment.
Key Responsibilities
System & Component Validation
• Validate the complete server system architecture, including compute, memory, storage, networking, and accelerators.
• Test and confirm interactions between server components: CPU, DRAM, NVMe devices, Network Interface Cards (NICs), GPUs/accelerators.
• Conduct individual component testing to ensure performance, stability, and reliability before and after system integration.
• Perform system-level validation under various configurations, workloads, and use cases.
• Assess benchmark results against technical specifications, business and application requirements, historical baselines, and prior generations.
• Identify performance regressions, instability, and reliability issues under sustained load and stress conditions.
• Ensure that hardware components and firmware align with business needs, performance objectives, and specifications.
Firmware, BIOS & BMC Validation
• Test and validate firmware for BMC/BIOS/FPGA.
• Validate BMC functionality including IPMI/Redfish, sensors, power management, thermal management, and logging.
• Conduct firmware upgrade, downgrade, and regression testing.
• Evaluate firmware impact on performance, stability, and reliability across various hardware configurations.
