About the job
Join our dynamic Engineering chapter team as a Site Reliability Engineer (SRE) specializing in the energy sector. This position is crucial for ensuring the reliability, scalability, monitoring, and performance of our on-premises services in a product-oriented environment. You will be instrumental in designing and implementing industry best practices, enhancing our infrastructure, and collaborating with cross-functional teams to guarantee stability, observability, and high availability of our services.
Key Responsibilities
- Develop and maintain robust monitoring infrastructure.
- Design and implement dashboards, alerts, and visualization tools.
- Set up distributed tracing and log aggregation systems.
- Define monitoring standards, SLI/SLO frameworks, and best practices.
- Ensure compliance with security protocols for on-premises monitoring tools.
- Automate deployment and configuration processes.
- Work closely with development teams to enhance instrumentation.
- Participate in on-call rotations for 24/7 incident support.

