About the job
About Us:
At HavocAI, we are pioneering the frontier of collaborative autonomy, where self-tasking teams of machines tackle complex human challenges. As a leader in this space, we set the benchmark for autonomous surface vessels, serving a diverse array of defense and commercial maritime operations. Our mission drives rapid growth, and we invite passionate individuals eager to solve difficult problems, innovate, and contribute towards conflict prevention and life-saving initiatives to join our team.
About the Role
We are on the lookout for a dynamic Cloud Platform Technical Lead Engineer to spearhead the architecture, reliability, and evolution of our engineering platform. You will guide a multidisciplinary team encompassing Cloud Platform, DevOps, Site Reliability Engineering (SRE), and Data Engineering, establishing the framework for how our infrastructure, services, and data systems are developed, deployed, and managed at scale.
Your responsibilities will include shaping the technical vision and execution of our cloud platform, ensuring it underpins mission-critical systems, grows in tandem with our company, and allows engineers to operate swiftly while maintaining reliability, security, and cost-efficiency. This position is both highly technical and strategic, perfect for a hands-on leader who flourishes in fast-paced settings and enjoys constructing systems that support other engineers.
What You’ll Do
Lead Platform Architecture
- Set and steer the technical direction for the cloud platform.
- Establish best practices across infrastructure, DevOps, SRE, and data systems.
- Make architectural choices that balance speed, scalability, and maintainability.
Build and Operate Core Infrastructure
- Design and manage foundational platform services, such as shared runtimes, service frameworks, and data infrastructure.
- Architect and maintain Kubernetes-based environments for stable multi-service systems.
- Ensure that our infrastructure is secure, scalable, and cost-effective.
Drive Reliability and DevOps Excellence
- Lead the development of CI/CD systems and deployment processes.
- Implement SRE practices including Service Level Indicators (SLIs), Service Level Objectives (SLOs), and incident response protocols.
- Define observability standards and metrics to ensure system reliability.

