About the job
Join Our Mission at Helion
Helion is pioneering the future of energy as a leading fusion power company headquartered in Everett, WA. Our ambitious goal is to construct the world's first fusion power plant, paving the way for limitless clean electricity. We envision a future where energy is not only clean and reliable but also affordable for all.
Since our inception in 2013, Helion has successfully raised over $1 billion from esteemed investors, including Sam Altman, Mithril, Capricorn Investment Group, SoftBank, and Lightspeed, to advance our innovative efforts. Our latest prototype, Trenta, has demonstrated remarkable capabilities, achieving 10,000 high-power pulses and plasma temperatures of 100 million degrees Celsius (9 keV). Currently, we are operating Polaris, our next-generation prototype, on the journey towards the first fusion power plant.
This is an extraordinary time to join Helion. As a Site Reliability Engineer, you will confront real-world challenges alongside a dedicated team that values urgency, rigor, ownership, and transparency. Your contributions will be vital in transforming the energy landscape, because the world cannot afford to wait.
Your Responsibilities:
Partner with engineering teams to design scalable, fault-tolerant systems aligned with performance and reliability objectives.
Establish and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs); implement error budgets and reliability metrics.
Lead incident responses, perform root cause analyses, and facilitate postmortem procedures.
Develop and maintain automation for deployment, monitoring, and infrastructure management utilizing tools such as Terraform, Kubernetes, and CI/CD pipelines.
Create and sustain observability platforms to monitor system health in real-time and enable proactive alerting.
Anticipate system demands and enhance performance through load testing and optimization strategies.
Collaborate with security teams to ensure infrastructure adherence to compliance and security standards.
Champion best practices and foster a culture of reliability and continuous improvement.

