companyNebius logo

Site Reliability Engineer

NebiusRemote - United States
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Strong proficiency in Linux systems, with expertise in Python and Bash scripting for automation purposes. Proven ability to troubleshoot complex system issues, covering hardware, software, and networking problems. Excellent analytical and problem-solving abilities, with a focus on optimizing system performance. Fluent working proficiency in English.

About the job

Why Join Nebius?
At Nebius, we are pioneering a transformative approach to cloud computing tailored for the global AI economy. Our mission is to equip our clients with innovative tools and resources that address real-world challenges, all while minimizing infrastructure costs and eliminating the need for extensive in-house AI/ML teams. Here, you will collaborate on the forefront of AI cloud infrastructure, working with some of the industry's most talented leaders and engineers.

About Us
Based in Amsterdam and publicly traded on Nasdaq, Nebius boasts a diverse presence with R&D centers across Europe, North America, and Israel. Our team comprises over 1,400 professionals, including more than 400 highly skilled engineers with profound expertise in hardware and software engineering, complemented by a dedicated in-house AI R&D team.

The Role

Nebius is currently seeking a Site Reliability Engineer to join our Hardware Infrastructure team. While there is an opportunity to work from our Amsterdam office, this position is also available remotely within the United States.

The Hardware Infrastructure team is responsible for designing, developing, and supporting systems integral to the data center lifecycle, including:

  • Functional and load testing systems.
  • Monitoring engineering equipment in our data centers (power supply, air and water cooling, etc.).
  • Monitoring IT equipment: racks, servers, JBODs, JBOGs, power shelves, and network devices.
  • Asset tracking.
  • Managing hardware repair tasks.
  • Server production oversight.

Key Responsibilities:

  • Ensure fault tolerance, scalability, and continuous operation of our services.
  • Employ cutting-edge technologies to resolve a variety of infrastructure challenges.
  • Implement and enhance CI/CD processes.

Qualifications:

  • Strong proficiency in Linux systems, with expertise in Python and Bash scripting for automation purposes.
  • Proven ability to troubleshoot complex system issues, covering hardware, software, and networking problems.
  • Excellent analytical and problem-solving abilities, with a focus on optimizing system performance.
  • Fluent working proficiency in English.

Preferred Qualifications:

  • A keen interest in backend development.
  • Experience in designing, developing, and maintaining hardware infrastructure.

About Nebius

Nebius is redefining cloud computing for the AI economy, providing tools that help customers tackle real challenges without incurring hefty infrastructure costs. Our culture fosters innovation, collaboration, and professional growth.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.