Nebius logoNebius logo

Senior Network Site Reliability Engineer at Nebius | Amsterdam, Netherlands

NebiusAmsterdam, Netherlands; Remote - EuropeNew
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Senior

Qualifications

Qualifications:Proven experience in network engineering or a related field, with a strong focus on reliability and operational excellence. Familiarity with SRE principles, incident management, and network design is essential. Excellent analytical skills with the ability to troubleshoot complex issues effectively. Experience with observability tools and automation frameworks is highly desirable.

About the job

Why Join Nebius
Nebius is pioneering a transformative era in cloud computing, tailored to meet the demands of the global AI economy. We provide the essential tools and resources that empower our clients to address real-world challenges and revolutionize their industries without incurring substantial infrastructure costs or assembling large in-house AI/ML teams. Our workforce is engaged at the forefront of AI cloud infrastructure, collaborating with some of the most talented and innovative leaders and engineers in the industry.

Our Work Environment
Headquartered in Amsterdam and publicly traded on Nasdaq, Nebius boasts a worldwide presence with R&D centers across Europe, North America, and Israel. Our diverse team of over 1400 professionals includes more than 400 highly skilled engineers, well-versed in both hardware and software engineering, complemented by an in-house AI R&D team.

The Role

We are seeking a Network Site Reliability Engineer (NetSRE) to play a critical role in developing and maintaining the foundational infrastructure of Nebius, the Network, which is essential for all other services. This engineering-centric SRE position will involve defining clear reliability objectives, implementing the necessary tooling and automation to achieve them, while enhancing the operational safety of the network as we scale rapidly.

Your Responsibilities Will Include:

  • Establish and oversee reliability benchmarks for network services and critical pathways (including SLIs/SLOs, availability targets, and error budgets as applicable).

  • Enhance reliability across the entire network, focusing not just on services, but also on site readiness, inter-site connectivity (DCI), and operational protocols.

  • Lead incident response efforts in your areas, directing investigations/postmortems and transforming failures into sustainable solutions rather than recurring issues.

  • Develop and refine observability tools including actionable metrics, logs, traces, alerting systems, and expedited debugging processes.

About Nebius

At Nebius, we are at the forefront of cloud computing innovation, providing groundbreaking solutions for the AI economy. Our commitment to reducing infrastructure costs while empowering businesses to leverage AI/ML technologies sets us apart as a leader in the industry.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.