Qualifications
ResponsibilitiesEstablish the standards and vision for our essential observability platform utilized by all engineering sectors.Collaborate with stakeholders to design, architect, build, and deliver core components of our observability services.Implement and troubleshoot comprehensive monitoring solutions that span multiple global cloud providers.Ensure reliability by designing services and infrastructure that are resilient, fault-tolerant, and self-healing.Determine and configure key metrics to effectively detect incidents and assess service health, availability, and performance.Participate in on-call rotations and contribute to a blameless post-mortem process.Enhance our observability capabilities, focusing on cost optimization, usability, and maintainability.RequirementsProven experience managing mission-critical services at scale.Expertise in observing large-scale distributed systems.Strong understanding of information security principles.Proficiency in at least one modern programming language beyond basic scripting.In-depth knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc.).Bachelor’s degree in Computer Science or a related field, or equivalent experience.Preferred QualificationsExperience with major cloud providers such as AWS, Google Cloud, or Microsoft Azure.Familiarity with Kubernetes environments and cluster management.
About the job
Join our dynamic SRE Observability team within the Platform Engineering division at MongoDB. We focus on creating and maintaining a robust observability stack, encompassing metrics, logging, and tracing, to support all engineering teams in delivering reliable services. Our responsibilities extend to managing essential services like our telemetry pipeline and monitoring and alerting infrastructure. Our technology stack includes cutting-edge tools such as VictoriaMetrics, Splunk, QuickWit, Jaeger, Fluentbit, and Vector. As a member of our team, you will collaborate closely with other Software Engineers (SWE) and Site Reliability Engineers (SRE) to advocate for and implement best practices in service instrumentation and monitoring. This is a highly collaborative position where you will play a vital role in maintaining MongoDB's critical internal infrastructure.
We welcome candidates based in Dublin as part of our hybrid work model.
About MongoDB
MongoDB is a pioneering database platform that empowers developers and businesses by providing a powerful, flexible, and scalable data solution. As an innovator in the tech industry, we prioritize collaboration and growth, offering our engineers the opportunity to work with the latest technologies in a supportive and dynamic environment. Join us in shaping the future of data management.