companyCoreWeave logo

Operations Engineering Manager - Fleet Reliability

CoreWeaveDublin, Ireland
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Manager

Qualifications

Proven experience in operations management within a technology or engineering environment. Strong leadership skills with a focus on team development and performance management. Experience with server infrastructure, cloud technologies, and automation tools. Excellent problem-solving skills and ability to drive process improvements. Strong communication abilities, both verbal and written.

About the job

CoreWeave is at the forefront of AI infrastructure, providing the essential cloud computing services tailored for innovators. Our platform equips AI pioneers with the necessary technology, tools, and expert teams to confidently build and scale their AI solutions. Trusted by top AI labs, startups, and global enterprises, CoreWeave combines unparalleled infrastructure performance with extensive technical expertise to drive breakthroughs and transform compute capabilities. Established in 2017, CoreWeave made its public debut on Nasdaq (CRWV) in March 2025. Discover more at www.coreweave.com.
 
We take pride in being a Living Wage accredited Employer.

 

Your Role

The Fleet Reliability Operations Team serves as the core of CoreWeave’s capacity delivery and maintenance initiatives. This team is tasked with provisioning, updating, and managing server nodes, along with executing the processes and tools that configure and validate our server fleet. As the first responders to hardware issues in production, this team is empowered to drive automation and observability design throughout our server fleet lifecycle.

We are on the lookout for an Operations Engineering Manager to join the Fleet Reliability Operations team. This role will be pivotal in maintaining and enhancing our delivery volume as we expand our fleet tenfold. You will cultivate a robust talent pipeline, oversee onboarding and training, provide leadership in processes, and advocate for reliability and customer satisfaction. As the manager of this team, you will have the chance to:

  • Establish and lead a 24/7 team of process-oriented engineers focused on reliability and observability.
  • Facilitate the development and documentation of clear, consistent processes for provisioning, validating, and troubleshooting nodes in our server fleet.
  • Critically assess and champion process and automation improvements, prioritizing event-driven automated remediation.
  • Provide a 24/7 engineering support function for critical, time-sensitive node delivery and maintenance.
  • Enhance our onboarding, documentation, enablement, and performance management programs to elevate team members' growth and capabilities.
  • Foster a culture of accountability and performance measurement within your team.

About CoreWeave

CoreWeave is the essential cloud platform for AI, empowering innovators with cutting-edge technology and expert support. We are committed to delivering exceptional performance and driving the future of AI infrastructure.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.