Qualifications
Key ResponsibilitiesReliability Engineering & ArchitectureDesign and refine reliability architecture for distributed and cloud-hosted systems. Establish and implement SRE best practices, encompassing SLIs, SLOs, error budgets, and capacity planning. Collaborate with platform and application teams to develop systems that prioritize reliability, scalability, and operability. Identify and address systemic reliability risks across infrastructure and services. Operations & Incident ManagementLead incident response procedures, including on-call rotations, escalation protocols, and post-incident reviews. Perform root cause analysis for intricate production incidents and drive long-term enhancements. Boost operational readiness through runbooks, automation, and resilience testing. Minimize operational toil through effective tooling, automation, and process optimizations. Observability & PerformanceDesign and maintain observability systems for metrics, logging, tracing, and alerting. Ensure services and data pipelines are observable, debuggable, and performant in production.
About the job
Join Our Team:
At HavocAI, we are at the forefront of collaborative autonomy, leading the way in the development of autonomous surface vessels for a variety of defense and commercial maritime operations. Our mission is to rapidly expand and innovate solutions that address complex human challenges, while prioritizing life-saving technologies. We are in search of passionate individuals committed to pushing boundaries and making a meaningful impact.
Role Overview
We are looking for a Senior Site Reliability Engineer (SRE) with a minimum of 7 years of experience in designing, operating, and scaling robust distributed systems. In this pivotal role, you will serve as a technical leader in our Cloud Platform team, ensuring the reliability, performance, and resilience of critical services that support autonomy, simulation, and data-heavy workloads.
You will collaborate with various teams, including Cloud Platform, DevOps, Data Engineering, and Autonomy, to define reliability standards, enhance operational maturity, and create systems that effectively scale under real-world conditions. The ideal candidate will possess deep technical expertise, demonstrate composure under pressure, and be experienced in managing end-to-end reliability outcomes.
About HavocAI
HavocAI is a pioneering company specializing in collaborative autonomy, setting benchmarks in the development of advanced autonomous surface vessels tailored for diverse maritime missions. Our innovative solutions are dedicated to addressing critical human challenges through cutting-edge technology and rapid growth.