About the job
Join Our Journey
At Snapp, we are transforming urban mobility. Our innovative ride-hailing and mobility platform connects millions of riders and drivers daily, providing safe, reliable, and efficient transportation solutions. Driven by real-time data and a strong infrastructure, we enhance urban travel, making it faster, simpler, and more sustainable.
With the agility of a startup and the perspective of a global tech leader, we develop services that scale across markets while responding to local demands.
Your Role and Impact
As an Infrastructure Observability Engineer on our Platform team, you will engage with various observability platforms, infrastructure monitoring, and DevOps automation to assure comprehensive visibility and high system reliability. Your responsibilities will include maintaining and enhancing monitoring and logging frameworks, analyzing infrastructure events, and implementing proactive enhancements to improve performance and resilience. This critical role focuses on automation and continuous optimization rather than just reactive support.
Key Responsibilities
- Develop, manage, and optimize monitoring and logging systems (Prometheus, Grafana, ELK, Zabbix, etc.)
- Ensure complete observability across infrastructure, networks, and services.
- Manage alerting rules, dashboards, and SLO/SLA metrics, along with anomaly detection.
- Analyze logs and metrics to detect patterns and potential risks.
- Oversee infrastructure health across compute, storage, virtualization, and network layers.
- Conduct root cause analysis on network-related incidents (Routing/Switching, load balancing, DNS, firewalls).
- Collaborate with network and data center teams on incident follow-ups.
- Maintain a solid understanding of network topologies and protocols.
