Site Reliability Engineer (SRE) - Incident Response

XBOWEurope remote

Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Experience Level

Experience

Qualifications

The ideal candidate will possess a strong foundation in cloud technologies, automation, and incident response methodologies. Experience with IaC tools such as Terraform or CloudFormation is essential. Proficiency in monitoring and logging tools, as well as familiarity with service level management, is highly desirable. A collaborative mindset and excellent problem-solving skills are crucial for success in this role.

About the job

Join XBOW and help shape the future of offensive security. In an era where attackers leverage AI to outpace defenders, we are at the forefront of creating a security platform that ensures organizations stay one step ahead. Our cutting-edge, AI-driven system autonomously identifies, validates, and exploits vulnerabilities, providing proof-backed results in mere hours instead of weeks.

Founded by Oege de Moor, the visionary behind GitHub Copilot, and supported by top-tier investors such as Sequoia and Altimeter, XBOW is tackling one of the most pressing challenges in cybersecurity. Over the past year, our exceptional AI team, comprised of leading AI experts and renowned security researchers, has discovered thousands of real-world zero-days in the software that billions depend on, securing the top position on HackerOne’s global leaderboard.

We are a dynamic group of innovators, hackers, and researchers who thrive on addressing complex challenges. If you are eager to explore the limits of AI, redefine cybersecurity, and be part of a team that is paving the way for a new era of defense, we would love to hear from you.

Your Role: Site Reliability Engineer (SRE) focused on Automation and Incident Response

As a Site Reliability Engineer at XBOW, you will play a crucial role in maintaining the stability, observability, and resilience of our production systems as we scale. You will be responsible for developing and maintaining automated reliability tools that encompass monitoring, alerting, and self-healing capabilities, while also setting and tracking service level objectives for both production and development environments.

This position requires close collaboration with infrastructure and feature teams to manage cloud systems through Infrastructure as Code (IaC), assess architectural changes for their impact on reliability and capacity, and respond to incidents during local working hours as part of a “follow the sun” model.

When incidents arise, you will lead or assist in root cause investigations, analyze incident trends across the organization, and implement improvements to mitigate future risks. Additionally, you will help maintain internal and customer-facing status dashboards to effectively communicate system health and uptime.

Responsibilities:

Automating site reliability infrastructure, monitoring, and self-healing systems.
Defining and owning Service Level Objectives for production and development deployments.
Implementing Infrastructure as Code for production and development systems in collaboration with the infrastructure engineering team.

About XBOW

XBOW is at the cutting edge of offensive security, leveraging AI technology to create innovative solutions that address critical vulnerabilities in software. With a focus on building a safer digital environment, we are committed to pioneering advancements in cybersecurity and leading the industry into a new era of defense.