companyInterSystems logo

Major Incident Lead - Site Reliability

InterSystemsDublin (Remote)
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Manager

Qualifications

Key Responsibilities Lead the comprehensive management of P1 and P2 major incidents affecting our managed services customers, acting as the primary coordination point. Communicate timely and accurately to customers, partners, internal leadership, and various operational teams. Provide executive-level updates during extended or critical incidents. Guide post-incident reviews and root cause analysis (RCA). Define and track corrective and preventative actions, identifying trends and opportunities for automation and improved tools. Contribute to the ongoing improvement of incident management processes, tools, and documentation. Ensure incident management aligns with SLAs and regulatory standards; maintain thorough incident documentation. Enhance major incident playbooks and escalation paths; participate in on-call duties and simulations.

About the job

Join our dynamic Managed Services team as a Major Incident Lead specializing in Site Reliability. In this critical role, you will spearhead the response to significant, customer-impacting incidents across InterSystems’ managed services platforms. As the Incident Commander, you will ensure swift service restoration, maintain clear and confident communication with stakeholders, and coordinate effectively across SRE, engineering, support, cloud, and service delivery teams.

Operating within a service model aligned with SRE principles, you will prioritize service reliability by leveraging service level indicators and objectives, focusing on reducing customer impact during live incidents over root cause analysis. Beyond immediate incident management, you will lead post-incident reviews to transform operational failures into actionable reliability enhancements and minimize repeat incidents.

This position is vital for preserving customer trust, ensuring platform resilience, and achieving operational excellence in a 24x7, mission-critical, and highly regulated environment.

About InterSystems

InterSystems is a global leader in data management solutions, committed to enhancing operational efficiency and reliability across managed services. We pride ourselves on innovation and excellence, ensuring our clients can trust in our platforms and services.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.