About the job
Flip develops an AI-powered employee experience platform designed for frontline workers. The company’s mission is to make internal information easily accessible for every employee, wherever they work. Flip is expanding quickly and aims to change how millions of frontline employees stay connected with their organizations.
Role overview
The Site Reliability Engineer (m/w/d) joins the Platform Squad to keep Flip’s infrastructure fast, resilient, and ready for growth. This role focuses on shaping reliability practices, building internal tools, and fostering a culture where engineering teams can deploy confidently at scale while maintaining high uptime. The position is well-suited for those who enjoy designing high-throughput, highly available systems and want to influence the production operations of a growing SaaS platform.
Key responsibilities
- Enable scaling: Expand and optimize Azure cloud infrastructure and Kubernetes clusters to support Flip’s global growth, prioritizing high throughput and availability.
- Ensure resilience & security: Design and implement zero-downtime deployments, effective rollback mechanisms, and disaster recovery strategies to keep the platform available at all times.
- Create observability: Improve the LGTM stack (Loki, Grafana, Tempo, Mimir) so teams have clear insight into system health and performance.
Location
This position can be based in Berlin or Stuttgart, Germany, or performed remotely from anywhere in Europe.

