About the job
At Doctolib, we pride ourselves on fostering a dynamic engineering environment where innovation thrives. Our mission is to enhance the lives of healthcare professionals and patients alike. We are seeking a Senior Site Reliability Engineer to ensure our production systems operate seamlessly, playing a crucial role in supporting the rapid expansion of Doctolib's services.
Your Responsibilities
As a Senior Site Reliability Engineer within the Core Reliability & Observability team, you will be instrumental in defining the company's observability strategy and maintaining the reliability, debuggability, and scalability of our platform. This position bridges infrastructure, developer experience, and product engineering, focusing on developing and enhancing the core elements of logging, metrics, tracing, and alerting across our organization.
- Lead the implementation of an observability strategy across the platform, emphasizing scalable, developer-friendly logging and tracing solutions.
- Identify and spearhead cross-functional reliability initiatives to enhance incident detection, response, and postmortem analysis capabilities.
- Participate in the on-call rotation and actively work on improving our on-call experience by optimizing alerting, minimizing noise, and providing actionable telemetry.
Who You Are
You could be our next teammate if you possess:
- A minimum of 3 years of hands-on experience with large-scale production platforms.
- Demonstrated proficiency with cloud platforms such as AWS, Azure, or Google Cloud.
- A strong understanding of containerization and orchestration technologies (Docker and Kubernetes).
- A deep knowledge of Helm for managing Kubernetes manifests and ArgoCD for GitOps workflows.
- Extensive expertise in observability tooling and architecture, including:
- Logging: Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Logstash, Vector.
- Tracing: OpenTelemetry or proprietary APMs.
- Metrics: Prometheus, Thanos, Datadog, or equivalent.
- Proficiency in at least one programming language (e.g., Ruby, Python, Go, Java) and a strong grasp of infrastructure as code principles.
- Experience with monitoring and observability tools.

