About the job
The Security Incident Response team, a crucial part of our Resilience Engineering organization, plays a pivotal role in safeguarding Datadog. Our mission is to prepare for and efficiently address security incidents, ensuring that any threats to our systems and data are swiftly contained. We also collaborate with various teams post-incident, viewing these moments as opportunities for growth and learning. By focusing on our adaptability and addressing systemic issues, we foster a culture of resilience within our team and our systems.
As an Engineering Manager, you will be at the forefront of this mission, leading a skilled team of engineers dedicated to enhancing Datadog’s incident response capabilities. You will develop tools and automation to boost our efficiency, collaborating with key stakeholders to ensure our focus is directed appropriately and that we are measuring our improvements effectively. As part of the leadership team, your influence will help shape our organizational strategy and cultural landscape.
At Datadog, we cherish our office culture, valuing the relationships and collaboration it fosters, as well as the creativity it inspires. We embrace a hybrid workplace model, allowing our Datadogs to achieve a work-life balance that suits them best.
What You’ll Do:
- Lead and mentor a passionate team of incident responders, cultivating a security-focused and resilient culture at Datadog. Provide opportunities for professional growth and development.
- Act as a hands-on leader during incidents, making decisive choices in uncertain situations and collaborating with multiple teams towards resolution. Participate in an on-call rotation with other leaders for critical decision support.
- Oversee the triage of alerts and signals in Datadog Cloud SIEM, ensuring a consistent and high-level response to emerging threats. Collaborate with the Threat Detection team to optimize signal performance.
- Develop tools, systems, and processes to advance Datadog's capabilities in security incident response. Communicate operational metrics clearly to stakeholders.
- Lead post-incident analyses to ensure that learnings from security incidents are captured, promoting blameless and actionable postmortems. Ensure follow-up actions address systemic issues and prevent future occurrences.

