About the job
Join our dynamic Managed Services team as a Major Incident Lead – Site Reliability. In this pivotal role, you will spearhead the management of high-severity incidents that impact our customers across InterSystems' managed services platforms. As the Incident Commander, you will be responsible for ensuring swift service restoration, effective communication with stakeholders, and coordinated efforts across Site Reliability Engineering (SRE), engineering, support, cloud, and service delivery teams.
Working within a SRE-aligned service model, your primary focus will be on preserving service reliability by utilizing service level indicators and objectives. You will prioritize minimizing customer impact over root cause analysis during live incidents. In addition to incident management, you will lead post-incident reviews, transforming operational setbacks into quantifiable reliability enhancements and preventing future occurrences.
This role is essential for upholding customer trust, platform resilience, and operational excellence in a 24/7, mission-critical, and highly regulated environment.

