Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
Proficient in programming languages such as Python, Go, or Java. Experience with cloud platforms like AWS, Azure, or GCP. Strong understanding of CI/CD pipelines and DevOps principles. Ability to monitor system performance and troubleshoot complex issues. Excellent communication and collaboration skills.
About the job
As a Site Reliability Engineer at dev2, you will play a crucial role in ensuring the reliability and performance of our services. You will work closely with development and operations teams to build and maintain scalable systems, troubleshoot issues, and implement best practices in reliability engineering. Your expertise will help us deliver exceptional service and maintain our commitment to quality.
About dev2
dev2 is a leading technology company headquartered in Boston, specializing in innovative software solutions that enhance operational efficiency. Our commitment to reliability and performance drives our success, and we are looking for talented individuals to join our dynamic team.
Similar jobs
1 - 20 of 1,417 Jobs
Search for Senior Staff Site Reliability Engineer Data Center
Full-time|$165.8K/yr - $224.4K/yr|Hybrid|Boston, MA or Remote
Who We AreAt PathAI, we are dedicated to revolutionizing patient outcomes through the power of AI-driven pathology. Our commitment to advancing traditional pathology methodologies into innovative technologies is at the forefront of our mission. By leveraging these advancements, we aim to expedite drug development, enhance diagnostic accuracy, and deliver life-saving treatments to patients with urgency. Join our diverse and talented team, united in solving intricate challenges and making a substantial impact in healthcare.Where You FitWe are seeking a highly skilled Senior Staff Site Reliability Engineer who will play a pivotal role in designing, constructing, and managing our hybrid cloud and on-premises environment.What You’ll DoIn this role, you will harness your extensive skills and develop new ones as you:Elevate our operational practices by implementing Site Reliability Engineering (SRE) best practices focused on user satisfaction, monitoring, and automation.Engineer robust infrastructure patterns for our cloud environments using Amazon Web Services, emphasizing security, reliability, and scalability.Design, construct, and manage our data center to support our rapidly expanding Machine Learning team.Integrate on-premises datacenter environments with our existing cloud infrastructure to create a seamless hybrid cloud solution.Enhance the reliability and resilience of our infrastructure through thorough root-cause analysis and identifying design gaps.Engage in platform on-call rotations and provide assistance during critical incident responses.
As a Site Reliability Engineer at dev2, you will play a crucial role in ensuring the reliability and performance of our services. You will work closely with development and operations teams to build and maintain scalable systems, troubleshoot issues, and implement best practices in reliability engineering. Your expertise will help us deliver exceptional service and maintain our commitment to quality.
Full-time|$166K/yr - $220K/yr|On-site|Boston, Massachusetts, United States
Anduril Industries is at the forefront of defense technology, dedicated to revolutionizing military capabilities for the U.S. and its allies through cutting-edge innovations. By integrating the expertise, technology, and business models from the most pioneering companies of the 21st century into the defense sector, Anduril is transforming the design, construction, and sale of military systems. Our advanced family of systems is driven by Lattice OS, an AI-enhanced operating system that synthesizes vast data streams into real-time, 3D command and control environments. In this era of strategic competition, we are committed to delivering state-of-the-art autonomy, AI, computer vision, sensor fusion, and networking technologies to the military in a matter of months rather than years.ABOUT THE TEAMThe Corporate Technology Engineering team plays a crucial role in developing and enhancing the various systems that empower Anduril to achieve its mission. Our technology solutions are vital for the supply chain, accounting, sales and growth, engineering, modeling and simulation, field maintenance, manufacturing, and more. We collaborate across the organization to ensure that our teams have the necessary tools and capabilities for mission success.ABOUT THE JOB:We are in search of an experienced Senior Site Reliability Engineer to join our dynamic team. In this role, you will be responsible for the design, deployment, scaling, and maintenance of the pivotal infrastructure that supports our systems. You will engage with a diverse array of stakeholder teams to facilitate swift and secure progress on their respective technology roadmaps.WHAT YOU'LL DO:Provision, manage, and scale intricate infrastructure for all Business Systems.Continuously optimize and refine CI/CD pipelines to improve the efficiency, reliability, and speed of software delivery.Promote a culture of observability and reliability, advocating for best practices and tools that enhance system visibility and resilience.Collaborate with cross-functional engineering teams to understand their needs and translate them into effective cloud solutions using industry best practices.Possess a deep understanding of the company’s business goals and objectives to design and implement infrastructure solutions that align with them.Strengthen systems and evaluate workload demands, planning resource capacity to guarantee optimal performance and cost-effectiveness.
Full-time|$134.3K/yr - $214.8K/yr|Hybrid|Boston, Massachusetts, United States
Become a Catalyst for Positive Change at Axon.At Axon, our mission is to Protect Life. We are innovators dedicated to addressing society's most pressing safety and justice challenges through our suite of devices and cloud software solutions. Collaboration is at the heart of our success; we engage with transparency and empathy, valuing diverse perspectives from our customers, communities, and each other.Working at Axon is dynamic, rewarding, and impactful. Here, you will take the lead and create substantial change while continually evolving in your role at a company that values your contributions.Your ContributionAs a Senior Site Reliability Engineer in the APX SRE organization, you will be instrumental in implementing efficient, scalable solutions that enhance the reliability and performance of our global cloud-native Kubernetes platform and its services. You are passionate about maintaining system stability, producing clear documentation, and developing tools that enrich the developer experience.Location: This position is located in our Boston, MA office, with a hybrid working model. We encourage in-person collaboration from Tuesday to Friday, allowing for remote work on Mondays unless otherwise accommodated. We believe that strong connections drive innovation, and our office culture is designed to promote meaningful teamwork, mentorship, and collective achievement.
Full-time|$134.3K/yr - $214.8K/yr|Hybrid|Boston, Massachusetts, United States
Become a Force for Good at Axon.At Axon, we are dedicated to our mission of protecting life. We tackle society's most pressing safety and justice challenges through our innovative ecosystem of devices and cloud software. Collaboration is at the heart of what we do; we connect with transparency and empathy, valuing diverse perspectives from our customers, communities, and team members.Life at Axon is dynamic, challenging, and impactful. Here, you will take initiative and make a real difference. Continuously evolve as you contribute to a mission that matters at a company where your contributions are valued.Your ImpactAs a Senior Site Reliability Engineer within the APX SRE CloudOps team, you will architect and build the cloud infrastructure and automation platforms critical to Axon's product engineering teams. You will design solutions for multi-cloud environments (Azure, AWS), ensure FedRAMP compliance, and oversee large-scale Kubernetes platforms managing production workloads across various regions. A significant aspect of your role will involve coding: developing services, APIs, and internal tools using languages like Go and Python. Additionally, you will participate in on-call rotations and incident response, leveraging operational insights to enhance reliability and guide platform investments. This position merges software engineering expertise with cloud architecture at scale and production ownership.Location: This role is based in our Atlanta, Seattle, or Boston office and operates on a hybrid schedule. We prioritize in-person collaboration, requiring team members to work on-site from Tuesday to Friday, with the option to work remotely on Mondays, unless a workplace accommodation is approved. We believe that connection fosters innovation, and our in-office culture is designed to promote meaningful teamwork, mentorship, and shared success.
Role Overview Beacon Biosignals is hiring a Site Reliability Engineer. This role focuses on improving the reliability and performance of the company’s systems. The position is open to candidates in Boston, MA or remote locations. What You Will Do Work with teams across engineering, product, and operations to support scalable infrastructure. Design, implement, and maintain systems that prioritize uptime and smooth user experiences. Help ensure high availability for Beacon Biosignals’ platforms and services.
Join Veeva Systems, a groundbreaking organization at the forefront of the industry cloud, dedicated to accelerating the delivery of therapies to patients worldwide. As one of the fastest-growing SaaS companies in history, we have achieved over $2 billion in revenue last fiscal year, with abundant growth opportunities on the horizon.At Veeva, we operate based on our core values: Do the Right Thing, Customer Success, Employee Success, and Speed. In 2021, we made history by becoming a public benefit corporation (PBC), committed to balancing the needs of our customers, employees, society, and investors.As a Work Anywhere company, we empower you to choose your ideal work environment, whether from home or in the office, to help you thrive.Be a part of our mission to transform the life sciences industry and positively impact our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be responsible for ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your deep knowledge of Java and modern open-source technologies to make a significant impact on our production systems.Ideal candidates will have extensive experience working with Java applications and the latest open-source technologies, preferably gained in enterprise software development or a rapidly growing tech environment. As a Senior SRE, you will need to be innately curious and possess strong problem-solving skills. Additionally, you will bring a unique engineering perspective, understanding how systems integrate in production to function at a global scale for hundreds of customers across North America, Europe, and Asia.
Full-time|$180K/yr - $225K/yr|Hybrid|Boston, Massachusetts, United States
Become a Force for Good at Axon.At Axon, our mission is to protect life through innovative solutions that address society's most pressing safety and justice challenges. We are a team of explorers, working collaboratively to develop a comprehensive ecosystem of devices and cloud-based software. We value connection, transparency, and diverse perspectives from our customers, communities, and each other.Life at Axon is both fast-paced and rewarding. Here, you will take charge and make a meaningful impact while continuously growing in a mission-driven environment that values your contributions.Your ImpactAs a Senior Site Reliability Engineer, you will play a pivotal role in shaping how Axon constructs and manages its core platforms, specifically focusing on Zero Touch—a compliant execution framework—and the surrounding identity and security infrastructures. Instead of manually provisioning infrastructure or managing tickets, you will design and develop the platforms, tools, and policies that empower hundreds of engineers to operate safely and efficiently at scale.Your expertise in infrastructure and platform engineering, along with your extensive experience in distributed systems, will guide your efforts toward automation, self-service, and enforcing best practices. You will prioritize APIs, workflows, and standardized processes over manual tasks, ensuring security, identity, and compliance are fundamental to your work, especially in regulated environments where precision and traceability are paramount.This position is highly collaborative, requiring you to work alongside senior engineers across product and platform teams to enhance the building, deployment, security, and operation of Axon’s cloud systems within a modern, AI-driven landscape.Location - This role is situated in our Boston office on a hybrid schedule. We emphasize in-person collaboration, with team members expected to work onsite from Tuesday to Friday, while enjoying the flexibility to work remotely on Mondays, unless a workplace accommodation has been approved. We believe that connections inspire innovation, and our office culture is designed to promote teamwork, mentorship, and collective success.
At Veeva Systems, we are dedicated to our mission and are recognized as trailblazers in the industry cloud, empowering life sciences companies to expedite the delivery of therapies to patients. As one of the fastest-growing SaaS companies in history, we achieved over $2 billion in revenue in our previous fiscal year, with immense growth opportunities on the horizon.Our core values—Do the Right Thing, Customer Success, Employee Success, and Speed—are the foundation of our culture. Distinctively, we made history in 2021 by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere company, we offer the flexibility to choose between working from home or in the office, allowing you to thrive in your preferred environment.Join us in our mission to transform the life sciences industry and make a positive impact on our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be pivotal in ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your extensive knowledge in Java and modern open-source technologies to significantly enhance our production systems.The ideal candidate will possess substantial experience with Java applications and the latest open-source technologies, particularly from enterprise software development or high-growth technology firms. As a Senior SRE, you should be naturally inquisitive and possess exceptional problem-solving skills. You will bring a unique engineering mindset, comprehending how systems integrate in production to function seamlessly for hundreds of customers across North America, Europe, and Asia.
Full-time|$127K/yr - $249K/yr|Remote|Boston; Miami; New York City; Pittsburgh; Raleigh; United States
Join MongoDB’s innovative Storage Layer Services (SLS) team as we redefine the MongoDB cloud storage layer. This dynamic team is at the forefront of developing high-performance, multi-tenant distributed storage solutions that not only enhance our existing Atlas storage framework but also empower our customers' workloads to operate with remarkable efficiency. In this pivotal role, you will collaborate closely with teams dedicated to building these storage services, defining Service Level Objectives (SLOs), shaping capacity plans, and ensuring the reliability, durability, and operational safety of the foundational storage layer that supports Atlas. As one of the founding members of this small but experienced team of Site Reliability Engineers (SREs), you will play a vital role in executing a multi-year vision for MongoDB’s cloud storage architecture. This position offers flexibility in location, allowing you to work from our offices in Boston, New York City, Raleigh, Miami, or Pittsburgh, or remotely from anywhere in the United States, provided you are based in the Eastern or Central time zones.
Join Xometry as a Site Reliability Engineer II (SRE) and be part of a dynamic team that drives innovation in the realm of automated manufacturing solutions. In this role, you will ensure the reliability, availability, and performance of our systems while collaborating closely with other engineering teams.
Join DigitalOcean as a Senior Data Center Engineer II, where you will play a crucial role in maintaining and optimizing our data center operations. This position offers the opportunity to work on challenging projects that impact our global infrastructure, ensuring reliability and efficiency.
Full-time|On-site|Boston; Charlotte; New York City; Philadelphia; Pittsburgh; Washington DC
Join MongoDB as a Team Lead for our Site Reliability Engineering (SRE) team focused on the Storage Layer Service. In this pivotal role, you will drive the reliability, availability, and performance of MongoDB's storage systems, collaborating closely with cross-functional teams to enhance our infrastructure and ensure optimal service delivery.
Join our dynamic team at DigitalOcean as a Senior Data Center Engineer II, where you will play a critical role in managing and optimizing our infrastructure. Your expertise will ensure the highest level of performance and reliability, driving our mission to simplify cloud computing.As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining scalable data center solutions. Your proactive approach will help us innovate and improve our systems continuously.
Join our dynamic Managed Services team as a Major Incident Lead – Site Reliability. In this pivotal role, you will spearhead the management of high-severity incidents that impact our customers across InterSystems' managed services platforms. As the Incident Commander, you will be responsible for ensuring swift service restoration, effective communication with stakeholders, and coordinated efforts across Site Reliability Engineering (SRE), engineering, support, cloud, and service delivery teams. Working within a SRE-aligned service model, your primary focus will be on preserving service reliability by utilizing service level indicators and objectives. You will prioritize minimizing customer impact over root cause analysis during live incidents. In addition to incident management, you will lead post-incident reviews, transforming operational setbacks into quantifiable reliability enhancements and preventing future occurrences. This role is essential for upholding customer trust, platform resilience, and operational excellence in a 24/7, mission-critical, and highly regulated environment.
The Data Center Operations Manager for Managed Services is pivotal in overseeing the daily operations and management of our U.S.-based data center environments, which support both internal systems and clients of InterSystems. This role is crucial for ensuring high availability, operational excellence, and adherence to compliance standards across third-party facilities while maintaining close collaboration with our internal infrastructure, platform, and application teams.The manager will lead the operations staff and vendor teams, manage relationships with colocation providers, and support cutting-edge infrastructure platforms, including software-defined storage, converged infrastructure, and containerized environments.Key ResponsibilitiesData Center OperationsOversee daily operations at multiple colocation data center facilities, acting as the primary contact for operational issues.Coordinate remote assistance, scheduled maintenance, and incident responses with colocation providers.Ensure the reliability and availability of power, cooling, space, network connectivity, and physical security.Lead incident management efforts and conduct root cause analyses (RCA) involving both internal teams and external providers.Maintain and enforce standard operating procedures (SOPs), runbooks, access protocols, and escalation paths.Infrastructure & Platform SupportSupport enterprise-level infrastructure platforms, including:Dell PowerFlex and other software-defined storage systems.Nutanix and VMware converged infrastructure.Kubernetes platforms (e.g., Spectro Cloud-managed environments).Collaborate with Systems, Platform, and Cloud teams to facilitate hardware lifecycle events, upgrades, and capacity expansions.Ensure operational readiness for containerized and virtualized workloads deployed in colocation facilities.Team LeadershipLead and develop the data center operations staff and contractors supporting colocation environments.Define staffing models, on-call rotations, and escalation coverage.Encourage operational discipline, enhance documentation quality, and promote continuous improvement.Vendor & Provider ManagementManage relationships with colocation providers, hardware vendors, network carriers, and service partners.Oversee service-level agreements (SLAs), maintenance coordination, access controls, and service delivery.Assist in contract reviews and renewals.
Full-time|$190K/yr - $300K/yr|Remote|Boston or Remote
Senior/Staff Software Engineer, Applications AcuityMD is an innovative software and data platform that accelerates the accessibility of medical technologies. We empower MedTech companies to gain insights into product usage, understand customer variations, and pinpoint opportunities that enable physicians to enhance patient care. Annually, the FDA approves approximately 6,000 new medical devices, and our solution streamlines the process of getting these products into the hands of physicians, ultimately improving patient outcomes. Supported by prominent investors like Benchmark, Redpoint, ICONIQ Growth, and Ajax Health, we are a rapidly expanding SaaS company. We are seeking a talented Senior/Staff Engineer to join our dynamic Customer Intelligence Team. This role offers the chance to engage in all facets of current and future product development. Collaborate with other passionate engineers on challenging projects that introduce cutting-edge technologies to patients. Team Mission Within the Application Team, our goal extends beyond engineering software; we are transforming how medical technologies are delivered to those in need. The Customer Intelligence Team empowers MedTech companies to expedite the adoption of state-of-the-art medical technology by optimizing their sales processes through seamless and context-rich workflows. Assist our clients in strategizing their sales efforts to both existing and prospective users by integrating market data and AI-driven recommendations into their daily operations. We collaborate with top-tier MedTech companies and are on a journey of rapid feature development to guarantee their success. Your contributions will be pivotal in delivering impactful, high-quality, and scalable solutions that truly matter. You will collaborate with Product Managers, Designers, and fellow Engineers to build comprehensive features across both web and mobile platforms. We prioritize delivering excellent work while fostering a fun, collaborative, and supportive team environment, as we believe strong relationships enhance our collective performance! Advance your career while working on meaningful projects across various tech stacks, including JavaScript, React, TypeScript, GraphQL, Node.js, Python, and SQL.
Join our dynamic team as a Senior Staff Engineer specializing as a Salesforce Technical Lead. In this pivotal role, you will spearhead innovative Salesforce solutions, guiding projects from conception through implementation. Your expertise will drive technical excellence and enhance our product offerings, ensuring a seamless user experience for our clients.
ezCater stands at the forefront of the food for work technology sector in the United States, seamlessly connecting businesses in need of workplace catering with a vast network of over 100,000 restaurants nationwide. Our innovative platform delivers flexible and scalable solutions, catering to everything from regular employee meals to special one-off events, all supported by our dedicated 24/7 customer service team. Additionally, we empower companies to efficiently manage their food budgets through a single, customizable platform. For our restaurant partners, ezCater drives growth by increasing their order volume and attracting high-value customers. We are proud to be backed by prominent investors including Insight, Iconiq, Lightspeed, GIC, SoftBank, and Quadrille.Are you enthusiastic about harnessing the power of data? Are you ready to influence a rapidly expanding two-sided marketplace through data-driven insights? Do you have innovative ideas on how to empower data scientists within a billion-dollar organization to optimize workforce planning or predict real-time customer lifetime value? If so, we want to connect with you!The Data Technology team at ezCater is on an exciting growth trajectory! As we prepare for the future, data will remain a core strategic asset across the company—encompassing everything from advanced machine learning to business intelligence and data governance. Data is poised to be our key differentiator, driving significant impact within the $100 billion catering industry.We are looking for a Senior Data Engineer to join our dynamic team and tackle complex data and platform challenges that will help accelerate our business growth. The ideal candidate is a data aficionado who champions best practices in systems and architecture, and is dedicated to creating robust infrastructure that underpins accurate and efficient data. You will collaborate directly with executive stakeholders as we embark on an organization-wide data modeling initiative, making adaptability and the ability to translate business needs into technical solutions essential.
Join Tagup, a pioneering defense technology firm established at MIT, as we revolutionize logistics superiority through cutting-edge AI solutions. We are rapidly expanding and seeking innovators who are driven to implement transformative technologies to tackle some of the most pressing challenges in high-stakes environments. This is a unique opportunity to contribute to vital work that enhances national security and supports the success of U.S. and allied forces. Be part of shaping the future of defense logistics for a safer world.At Tagup, curiosity is an integral part of our culture. If you thrive on understanding complex systems, sharing knowledge, and learning from brilliant colleagues, you will feel right at home. Our team of engineers and data scientists is on a mission to enhance the safety, reliability, and efficiency of the machines and processes that drive the world. Our AI technology directly optimizes large-scale industrial equipment and logistics systems, ensuring top-tier performance for our clients.We are looking for a Cloud/DevOps Engineer who can adopt a Site Reliability Engineering (SRE) approach to our platform: automate using Infrastructure as Code (IaC), orchestrate with Kubernetes, and optimize PostgreSQL-backed services for heightened performance and availability. You will develop secure, auditable CI/CD pipelines, enforce a least-privilege access model by default, and maintain compliance across diverse, multi-region environments.
Oct 29, 2025
Sign in to browse more jobs
Create account — see all 1,417 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.