Major Incident Lead Site Reliability jobs in Boston – Browse 220 openings on RoboApply Jobs

Major Incident Lead Site Reliability jobs in Boston

Open roles matching “Major Incident Lead Site Reliability” with location signals for Boston. 220 active listings on RoboApply Jobs.

220 jobs found

1 - 20 of 220 Jobs
Apply
companyInterSystems logo
Full-time|$87K/yr - $116K/yr|On-site|Boston, MA

Join our dynamic Managed Services team as a Major Incident Lead – Site Reliability. In this pivotal role, you will spearhead the management of high-severity incidents that impact our customers across InterSystems' managed services platforms. As the Incident Commander, you will be responsible for ensuring swift service restoration, effective communication with stakeholders, and coordinated efforts across Site Reliability Engineering (SRE), engineering, support, cloud, and service delivery teams. Working within a SRE-aligned service model, your primary focus will be on preserving service reliability by utilizing service level indicators and objectives. You will prioritize minimizing customer impact over root cause analysis during live incidents. In addition to incident management, you will lead post-incident reviews, transforming operational setbacks into quantifiable reliability enhancements and preventing future occurrences. This role is essential for upholding customer trust, platform resilience, and operational excellence in a 24/7, mission-critical, and highly regulated environment.

Feb 11, 2026
Apply
companyMongoDB, Inc. logo
Full-time|On-site|Boston; Charlotte; New York City; Philadelphia; Pittsburgh; Washington DC

Join MongoDB as a Team Lead for our Site Reliability Engineering (SRE) team focused on the Storage Layer Service. In this pivotal role, you will drive the reliability, availability, and performance of MongoDB's storage systems, collaborating closely with cross-functional teams to enhance our infrastructure and ensure optimal service delivery.

Mar 25, 2026
Apply
companydev2 logo
Full-time|On-site|Boston

As a Site Reliability Engineer at dev2, you will play a crucial role in ensuring the reliability and performance of our services. You will work closely with development and operations teams to build and maintain scalable systems, troubleshoot issues, and implement best practices in reliability engineering. Your expertise will help us deliver exceptional service and maintain our commitment to quality.

Dec 11, 2023
Apply
companyBeacon Biosignals logo
Site Reliability Engineer

Beacon Biosignals

Full-time|Remote|Boston, MA - Remote

Role Overview Beacon Biosignals is hiring a Site Reliability Engineer. This role focuses on improving the reliability and performance of the company’s systems. The position is open to candidates in Boston, MA or remote locations. What You Will Do Work with teams across engineering, product, and operations to support scalable infrastructure. Design, implement, and maintain systems that prioritize uptime and smooth user experiences. Help ensure high availability for Beacon Biosignals’ platforms and services.

Apr 17, 2026
Apply
companyAnduril Industries logo
Full-time|$166K/yr - $220K/yr|On-site|Boston, Massachusetts, United States

Anduril Industries is at the forefront of defense technology, dedicated to revolutionizing military capabilities for the U.S. and its allies through cutting-edge innovations. By integrating the expertise, technology, and business models from the most pioneering companies of the 21st century into the defense sector, Anduril is transforming the design, construction, and sale of military systems. Our advanced family of systems is driven by Lattice OS, an AI-enhanced operating system that synthesizes vast data streams into real-time, 3D command and control environments. In this era of strategic competition, we are committed to delivering state-of-the-art autonomy, AI, computer vision, sensor fusion, and networking technologies to the military in a matter of months rather than years.ABOUT THE TEAMThe Corporate Technology Engineering team plays a crucial role in developing and enhancing the various systems that empower Anduril to achieve its mission. Our technology solutions are vital for the supply chain, accounting, sales and growth, engineering, modeling and simulation, field maintenance, manufacturing, and more. We collaborate across the organization to ensure that our teams have the necessary tools and capabilities for mission success.ABOUT THE JOB:We are in search of an experienced Senior Site Reliability Engineer to join our dynamic team. In this role, you will be responsible for the design, deployment, scaling, and maintenance of the pivotal infrastructure that supports our systems. You will engage with a diverse array of stakeholder teams to facilitate swift and secure progress on their respective technology roadmaps.WHAT YOU'LL DO:Provision, manage, and scale intricate infrastructure for all Business Systems.Continuously optimize and refine CI/CD pipelines to improve the efficiency, reliability, and speed of software delivery.Promote a culture of observability and reliability, advocating for best practices and tools that enhance system visibility and resilience.Collaborate with cross-functional engineering teams to understand their needs and translate them into effective cloud solutions using industry best practices.Possess a deep understanding of the company’s business goals and objectives to design and implement infrastructure solutions that align with them.Strengthen systems and evaluate workload demands, planning resource capacity to guarantee optimal performance and cost-effectiveness.

Mar 31, 2026
Apply
companyAxon Enterprise, Inc. logo
Full-time|$134.3K/yr - $214.8K/yr|Hybrid|Boston, Massachusetts, United States

Become a Catalyst for Positive Change at Axon.At Axon, our mission is to Protect Life. We are innovators dedicated to addressing society's most pressing safety and justice challenges through our suite of devices and cloud software solutions. Collaboration is at the heart of our success; we engage with transparency and empathy, valuing diverse perspectives from our customers, communities, and each other.Working at Axon is dynamic, rewarding, and impactful. Here, you will take the lead and create substantial change while continually evolving in your role at a company that values your contributions.Your ContributionAs a Senior Site Reliability Engineer in the APX SRE organization, you will be instrumental in implementing efficient, scalable solutions that enhance the reliability and performance of our global cloud-native Kubernetes platform and its services. You are passionate about maintaining system stability, producing clear documentation, and developing tools that enrich the developer experience.Location: This position is located in our Boston, MA office, with a hybrid working model. We encourage in-person collaboration from Tuesday to Friday, allowing for remote work on Mondays unless otherwise accommodated. We believe that strong connections drive innovation, and our office culture is designed to promote meaningful teamwork, mentorship, and collective achievement.

Mar 27, 2026
Apply
companyAxon logo
Full-time|$134.3K/yr - $214.8K/yr|Hybrid|Boston, Massachusetts, United States

Become a Force for Good at Axon.At Axon, we are dedicated to our mission of protecting life. We tackle society's most pressing safety and justice challenges through our innovative ecosystem of devices and cloud software. Collaboration is at the heart of what we do; we connect with transparency and empathy, valuing diverse perspectives from our customers, communities, and team members.Life at Axon is dynamic, challenging, and impactful. Here, you will take initiative and make a real difference. Continuously evolve as you contribute to a mission that matters at a company where your contributions are valued.Your ImpactAs a Senior Site Reliability Engineer within the APX SRE CloudOps team, you will architect and build the cloud infrastructure and automation platforms critical to Axon's product engineering teams. You will design solutions for multi-cloud environments (Azure, AWS), ensure FedRAMP compliance, and oversee large-scale Kubernetes platforms managing production workloads across various regions. A significant aspect of your role will involve coding: developing services, APIs, and internal tools using languages like Go and Python. Additionally, you will participate in on-call rotations and incident response, leveraging operational insights to enhance reliability and guide platform investments. This position merges software engineering expertise with cloud architecture at scale and production ownership.Location: This role is based in our Atlanta, Seattle, or Boston office and operates on a hybrid schedule. We prioritize in-person collaboration, requiring team members to work on-site from Tuesday to Friday, with the option to work remotely on Mondays, unless a workplace accommodation is approved. We believe that connection fosters innovation, and our in-office culture is designed to promote meaningful teamwork, mentorship, and shared success.

Apr 10, 2026
Apply
companyAxon logo
Full-time|$180K/yr - $225K/yr|Hybrid|Boston, Massachusetts, United States

Become a Force for Good at Axon.At Axon, our mission is to protect life through innovative solutions that address society's most pressing safety and justice challenges. We are a team of explorers, working collaboratively to develop a comprehensive ecosystem of devices and cloud-based software. We value connection, transparency, and diverse perspectives from our customers, communities, and each other.Life at Axon is both fast-paced and rewarding. Here, you will take charge and make a meaningful impact while continuously growing in a mission-driven environment that values your contributions.Your ImpactAs a Senior Site Reliability Engineer, you will play a pivotal role in shaping how Axon constructs and manages its core platforms, specifically focusing on Zero Touch—a compliant execution framework—and the surrounding identity and security infrastructures. Instead of manually provisioning infrastructure or managing tickets, you will design and develop the platforms, tools, and policies that empower hundreds of engineers to operate safely and efficiently at scale.Your expertise in infrastructure and platform engineering, along with your extensive experience in distributed systems, will guide your efforts toward automation, self-service, and enforcing best practices. You will prioritize APIs, workflows, and standardized processes over manual tasks, ensuring security, identity, and compliance are fundamental to your work, especially in regulated environments where precision and traceability are paramount.This position is highly collaborative, requiring you to work alongside senior engineers across product and platform teams to enhance the building, deployment, security, and operation of Axon’s cloud systems within a modern, AI-driven landscape.Location - This role is situated in our Boston office on a hybrid schedule. We emphasize in-person collaboration, with team members expected to work onsite from Tuesday to Friday, while enjoying the flexibility to work remotely on Mondays, unless a workplace accommodation has been approved. We believe that connections inspire innovation, and our office culture is designed to promote teamwork, mentorship, and collective success.

Apr 10, 2026
Apply
companyXometry logo
Full-time|On-site|Boston, MA

Join Xometry as a Site Reliability Engineer II (SRE) and be part of a dynamic team that drives innovation in the realm of automated manufacturing solutions. In this role, you will ensure the reliability, availability, and performance of our systems while collaborating closely with other engineering teams.

Mar 21, 2026
Apply
companyPathAI logo
Full-time|$165.8K/yr - $224.4K/yr|Hybrid|Boston, MA or Remote

Who We AreAt PathAI, we are dedicated to revolutionizing patient outcomes through the power of AI-driven pathology. Our commitment to advancing traditional pathology methodologies into innovative technologies is at the forefront of our mission. By leveraging these advancements, we aim to expedite drug development, enhance diagnostic accuracy, and deliver life-saving treatments to patients with urgency. Join our diverse and talented team, united in solving intricate challenges and making a substantial impact in healthcare.Where You FitWe are seeking a highly skilled Senior Staff Site Reliability Engineer who will play a pivotal role in designing, constructing, and managing our hybrid cloud and on-premises environment.What You’ll DoIn this role, you will harness your extensive skills and develop new ones as you:Elevate our operational practices by implementing Site Reliability Engineering (SRE) best practices focused on user satisfaction, monitoring, and automation.Engineer robust infrastructure patterns for our cloud environments using Amazon Web Services, emphasizing security, reliability, and scalability.Design, construct, and manage our data center to support our rapidly expanding Machine Learning team.Integrate on-premises datacenter environments with our existing cloud infrastructure to create a seamless hybrid cloud solution.Enhance the reliability and resilience of our infrastructure through thorough root-cause analysis and identifying design gaps.Engage in platform on-call rotations and provide assistance during critical incident responses.

Jan 20, 2026
Apply
companyVeeva Systems Inc. logo
Full-time|Remote|Massachusetts - Boston

Join Veeva Systems, a groundbreaking organization at the forefront of the industry cloud, dedicated to accelerating the delivery of therapies to patients worldwide. As one of the fastest-growing SaaS companies in history, we have achieved over $2 billion in revenue last fiscal year, with abundant growth opportunities on the horizon.At Veeva, we operate based on our core values: Do the Right Thing, Customer Success, Employee Success, and Speed. In 2021, we made history by becoming a public benefit corporation (PBC), committed to balancing the needs of our customers, employees, society, and investors.As a Work Anywhere company, we empower you to choose your ideal work environment, whether from home or in the office, to help you thrive.Be a part of our mission to transform the life sciences industry and positively impact our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be responsible for ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your deep knowledge of Java and modern open-source technologies to make a significant impact on our production systems.Ideal candidates will have extensive experience working with Java applications and the latest open-source technologies, preferably gained in enterprise software development or a rapidly growing tech environment. As a Senior SRE, you will need to be innately curious and possess strong problem-solving skills. Additionally, you will bring a unique engineering perspective, understanding how systems integrate in production to function at a global scale for hundreds of customers across North America, Europe, and Asia.

Oct 7, 2025
Apply
companyMongoDB logo
Full-time|$127K/yr - $249K/yr|Remote|Boston; Miami; New York City; Pittsburgh; Raleigh; United States

Join MongoDB’s innovative Storage Layer Services (SLS) team as we redefine the MongoDB cloud storage layer. This dynamic team is at the forefront of developing high-performance, multi-tenant distributed storage solutions that not only enhance our existing Atlas storage framework but also empower our customers' workloads to operate with remarkable efficiency. In this pivotal role, you will collaborate closely with teams dedicated to building these storage services, defining Service Level Objectives (SLOs), shaping capacity plans, and ensuring the reliability, durability, and operational safety of the foundational storage layer that supports Atlas. As one of the founding members of this small but experienced team of Site Reliability Engineers (SREs), you will play a vital role in executing a multi-year vision for MongoDB’s cloud storage architecture. This position offers flexibility in location, allowing you to work from our offices in Boston, New York City, Raleigh, Miami, or Pittsburgh, or remotely from anywhere in the United States, provided you are based in the Eastern or Central time zones.

Apr 8, 2026
Apply
companyVeeva Systems Inc. logo
Full-time|Hybrid|Massachusetts - Boston

At Veeva Systems, we are dedicated to our mission and are recognized as trailblazers in the industry cloud, empowering life sciences companies to expedite the delivery of therapies to patients. As one of the fastest-growing SaaS companies in history, we achieved over $2 billion in revenue in our previous fiscal year, with immense growth opportunities on the horizon.Our core values—Do the Right Thing, Customer Success, Employee Success, and Speed—are the foundation of our culture. Distinctively, we made history in 2021 by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere company, we offer the flexibility to choose between working from home or in the office, allowing you to thrive in your preferred environment.Join us in our mission to transform the life sciences industry and make a positive impact on our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be pivotal in ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your extensive knowledge in Java and modern open-source technologies to significantly enhance our production systems.The ideal candidate will possess substantial experience with Java applications and the latest open-source technologies, particularly from enterprise software development or high-growth technology firms. As a Senior SRE, you should be naturally inquisitive and possess exceptional problem-solving skills. You will bring a unique engineering mindset, comprehending how systems integrate in production to function seamlessly for hundreds of customers across North America, Europe, and Asia.

Oct 7, 2025
Apply
companyWHOOP, Inc. logo
Full-time|On-site|Boston, MA

At WHOOP, we are dedicated to enhancing human performance and extending healthspan through innovative wearable technology. Our cutting-edge devices deliver personalized insights that empower millions of members to gain a deeper understanding of their bodies and make informed decisions regarding training, recovery, and lifestyle choices. We are looking for a proactive and technically adept Incident Response Lead to spearhead security incident response across our organization. In this pivotal role, you will act as the primary internal escalation point and hands-on responder for security incidents. You will collaborate closely with WHOOP’s 24x7 Security Operations Center (SOC) provider and various cross-functional stakeholders to efficiently investigate, contain, and remediate emerging threats. This role demands a highly technical individual contributor with substantial ownership and visibility within Security, IT, Governance, Risk, and Compliance (GRC), as well as Legal.

Mar 12, 2026
Apply
companyDatadog logo
Full-time|$165K/yr - $180K/yr|Hybrid|Boston, Massachusetts, USA; New York, New York, USA

The Datadog Major Accounts Team is dedicated to fostering new business opportunities and expanding our footprint within the largest and most strategic clients we serve. This pivotal role not only influences the trajectory of Datadog's success but also comes with ambitious revenue targets. We are on the lookout for experienced and accomplished Account Executives who thrive in a fast-paced sales environment and possess a genuine passion for technology. Since its inception in 2021, the Major Accounts Team has seen remarkable success, prompting our ongoing expansion. This role presents a lucrative opportunity for driven, competitive sales professionals.At Datadog, we value our office culture, which fosters strong relationships, collaboration, and creativity. We operate in a hybrid workplace to help our Datadogs achieve a work-life balance that suits their individual needs.

Feb 24, 2026
Apply
companySonarSource logo
Full-time|Remote|Boston

SonarSource is a leader in agent-centric software development, focusing on AI-driven code review and verification. The company addresses a critical challenge: making sure software created by AI-assisted developers or autonomous agents is reliable, secure, and easy to maintain. SonarSource tools work with platforms like Claude Code, Codex, Cursor, GitHub Copilot, Gemini, and Devin. More than 75% of Fortune 100 companies use these solutions to help ensure their software meets compliance and reliability standards. Clients using Sonar report 44% fewer outages caused by AI-generated code. Code verification is positioned as the missing link in the Agent-Centric Development Cycle. Major organizations such as Nvidia, ServiceNow, Booking.com, Goldman Sachs, AstraZeneca, and Ford Motor Company trust SonarSource for independent and consistent code review and governance of AI-generated code. Key Products SonarQube: A widely used platform for AI code review and verification. SonarQube Foundation Agent: Focused on agentic software repair. SonarSweep & Sonar Context Augmentation: Tools that deliver enterprise-level context and constraints for effective agent operation. The company operates across global hubs, including Austin, Bochum, Dubai, Geneva, London, Singapore, Tokyo, and Washington D.C. SonarSource works with a philosophy known as CODE.

Apr 29, 2026
Apply
companyFlock Safety logo
Full-time|Remote|Boston, MA

Join Flock SafetyAt Flock Safety, we are at the forefront of safety technology, dedicated to empowering communities through innovative crime prevention and security solutions. Our comprehensive hardware and software ecosystem connects cities, law enforcement, businesses, schools, and neighborhoods into a nationwide public-private safety network. With the trust of over 5,000 communities, 4,500 law enforcement agencies, and 1,000 businesses, Flock delivers real-time intelligence while upholding privacy and responsible innovation.We pride ourselves on being a high-performance, low-ego team propelled by urgency, collaboration, and bold thinking. Working at Flock means embracing challenges, acting swiftly, and striving for continuous improvement. The environment can be intense, yet it is profoundly rewarding for those eager to make a meaningful impact.Backed by nearly $700M in venture funding and a valuation of $7.5B, we are deliberately scaling and in search of exceptional talent to help us achieve the extraordinary. If you value teamwork, ownership, and solving complex problems, Flock could be your ideal workplace.Your RoleAre you a driven and seasoned Strategic Major Accounts Executive eager to make a difference by promoting cutting-edge technology aimed at crime reduction? If you thrive in a competitive, fast-paced, mission-driven atmosphere, this role is a game-changing opportunity for you. Flock is looking to enhance our growing Enterprise Public Sector team with a Major Accounts Executive who will play a pivotal role in propelling our company growth by concentrating on a specific market, managing demand generation, fostering partnerships, and overseeing the entire sales cycle.This position is entirely remote, with candidates required to reside in the Boston/NYC/NJ/PA area. It also involves regional travel of up to 50% during peak seasons.Your ImpactOversee the complete sales process from prospecting to closing and client onboarding.Achieve or surpass monthly, quarterly, and annual sales targets.Establish and cultivate strong relationships with local law enforcement agencies within your territory.Develop a robust pipeline of leads across various sectors.

Mar 10, 2026
Apply
companyHighbar Physical Therapy logo
Site Coordinator

Highbar Physical Therapy

Full-time|On-site|Boston, MA - Kenmore Square

Join Highbar Physical Therapy - A Pioneer in Outpatient Care!At Highbar, we are not just a physical therapy practice; we are innovators striving to transform the industry. Our expanding network across New England embodies our commitment to exceptional care, making us stand out in the field.We blend the latest scientific advancements in musculoskeletal health with personalized patient care, consistently achieving remarkable outcomes for those we serve.Position Overview: Site Coordinator We are looking for a dynamic Site Coordinator who thrives in a collaborative environment and enjoys taking on diverse responsibilities. This role involves providing administrative support, managing daily clinic operations, and working closely with the Clinic Director to ensure seamless service delivery. This is a full-time position with immediate openings, and we are eager to welcome a new team member!

Apr 13, 2026
Apply
companyCoServe Global Solutions logo
Contract|On-site|Boston

Join Our Team Across the Northeastern United States!We are seeking highly skilled and principled Site Supervisors to spearhead the deployment of Cisco 2900 routers within enterprise environments throughout the Northeast. In this role, you will oversee the installation of Cisco 2900 routers and Aruba access points.You will manage a team of two installers, liaise with the end client, coordinate logistics, and supervise the entire deployment process. Each job site will involve 1-2 nights of work before transitioning to the next location. This position also entails acting as the regional manager to ensure that all maintenance tickets are promptly addressed. Please note, this position will require night work when business operations are closed.The project is expected to last approximately one year, with possibilities for extension.

Apr 16, 2015
Apply
companyTagup logo
Full-time|On-site|Boston, MA

Join Tagup, a pioneering defense technology firm established at MIT, as we revolutionize logistics superiority through cutting-edge AI solutions. We are rapidly expanding and seeking innovators who are driven to implement transformative technologies to tackle some of the most pressing challenges in high-stakes environments. This is a unique opportunity to contribute to vital work that enhances national security and supports the success of U.S. and allied forces. Be part of shaping the future of defense logistics for a safer world.At Tagup, curiosity is an integral part of our culture. If you thrive on understanding complex systems, sharing knowledge, and learning from brilliant colleagues, you will feel right at home. Our team of engineers and data scientists is on a mission to enhance the safety, reliability, and efficiency of the machines and processes that drive the world. Our AI technology directly optimizes large-scale industrial equipment and logistics systems, ensuring top-tier performance for our clients.We are looking for a Cloud/DevOps Engineer who can adopt a Site Reliability Engineering (SRE) approach to our platform: automate using Infrastructure as Code (IaC), orchestrate with Kubernetes, and optimize PostgreSQL-backed services for heightened performance and availability. You will develop secure, auditable CI/CD pipelines, enforce a least-privilege access model by default, and maintain compliance across diverse, multi-region environments.

Oct 29, 2025

Sign in to browse more jobs

Create account — see all 220 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.