About the job
Join our dynamic team at DoubleZero as a Site Reliability Engineer (SRE) where you’ll leverage your systems architecture mindset to enhance reliability through advanced automation techniques. If you thrive in expansive production environments and are driven by a strong aversion to manual processes, we want to hear from you!
Key Responsibilities
Develop and implement automation-centric reliability systems using Golang.
Approach infrastructure as pipelines encompassing provisioning, observability, deployment, and failover strategies.
Create internal tools and reliability infrastructure, including control planes.
Take ownership of production readiness, alerting mechanisms, and operational maturity.
Collaborate across protocol, networking, and hardware layers.
Desired Qualifications
Experience in a similar SRE or Production Engineering role within a large tech organization.
Proficiency in Golang development within production settings.
Strong systems thinking capabilities, with an emphasis on scalability, reliability, and operational efficiency.
Experience in building internal platforms, not just managing existing systems.
Familiarity with distributed systems, networking protocols, and foundational system behaviors.

