About the job
Empowering Every Employee.
Our mission is to be the world's most utilized AI employee experience platform by transforming the way frontline employees operate.
At Flip, we aim to revolutionize the world of frontline workers and give them a voice! Become a Flip Game Changer and join an unbeatable team dedicated to ensuring that all employees, regardless of their work location, have access to their internal company information. Are you ready to transform the work lives of millions? Join us!
Job Description
As a Site Reliability Engineer in our Platform Squad, you will play a pivotal role in maintaining Flip's infrastructure to ensure it is fast, resilient, and prepared for scaling. You will shape the reliability culture, tools, and practices that empower our engineering teams to release with confidence—at scale and without compromising availability. This role is ideal for an engineer passionate about building high-throughput and highly available systems who wants to influence how a rapidly growing SaaS platform operates in production.
What Awaits You
Enable Scaling: Expand and optimize our Azure cloud infrastructure and Kubernetes clusters designed for high throughput and maximum availability to support Flip's rapid global growth.
Ensure Resilience & Security: Design and implement zero-downtime deployments, rollback mechanisms, and disaster recovery strategies that keep our platform available around the clock.
Create Observability: Enhance our LGTM stack (Loki, Grafana, Tempo, Mimir) to provide every team with the necessary visibility and leverage it to...

