Qualifications
Key ResponsibilitiesDesign, develop, and sustain scalable infrastructure to facilitate real-time analytics and machine learning workloads. Enhance system reliability and performance through automation, observability, and proactive capacity planning. Lead the evolution of CI/CD pipelines, deployment automation, rollback mechanisms, and configuration management. Establish and maintain monitoring, alerting, and incident response protocols, including SLOs, runbooks, and on-call rotations. Foster collaboration across engineering and data science teams to promote a culture of performance and reliability. Ensure security, compliance, and operational readiness of our cloud infrastructure. Drive post-incident analyses and continuous improvement efforts.
About the job
About the Role
Join alembic as a Senior Site Reliability Engineer (SRE) and become an integral part of our mission to enhance platform reliability, observability, and operational excellence. In this pivotal role, you will collaborate with engineers and data scientists to architect, automate, and maintain the robust infrastructure that drives our platform, including data pipelines, machine learning workloads, and real-time analytics systems.
This hands-on position offers significant visibility across the technology stack and provides you with the opportunity to shape the future of our infrastructure and operations.