About the job
Archer Aviation is pioneering the future of sustainable air travel, developing an innovative all-electric vertical takeoff and landing aircraft. Based in San Jose, California, our mission is to enhance the advantages of sustainable air mobility while minimizing noise pollution.
We are committed to tackling complex challenges and believe that a diverse workforce fosters creativity and drives success. Our inclusive culture celebrates individuality and strives to create an equitable environment for all team members.
Staff Site Reliability Engineer
The Role
We are seeking a Staff Site Reliability Engineer to become a vital member of our SRE team. In this position, you will engineer resilient systems by developing custom internal tools, enhancing our observability stack, and ensuring the robustness of our SLO/SLI frameworks. Collaborating closely with our existing SRE teams, you will automate operational tasks and transition to programmable infrastructure.
Key Responsibilities
- Standardize SRE Procedures: Develop and implement standardized technical procedures for incident management, error budget tracking, and production readiness across our services.
- Engineer SLOs & SLIs: Instrument our services to accurately capture SLIs. You will manage the backend logic for calculating Error Budgets and automating alerts based on burn rates.
- Build Special Purpose Tooling: Write production-quality code (Go, Python, etc.) to create internal tools that address specific infrastructure challenges, including custom Kubernetes operators, automated remediation scripts, or deployment safety gates.
- Executive & Operational Dashboards: Develop a cohesive observability layer, including in-depth Grafana/Datadog dashboards for real-time debugging and high-level aggregate views for executive monitoring of SLA compliance.
- Toil Reduction: Identify and automate repetitive operational tasks to improve efficiency.
- Collaborative Engineering: Work alongside the SRE group to foster a culture of collaboration and innovation.

