Staff Site Reliability Engineer jobs in Dublin – Browse 447 openings on RoboApply Jobs

Staff Site Reliability Engineer jobs in Dublin

Open roles matching “Staff Site Reliability Engineer” with location signals for Dublin. 447 active listings on RoboApply Jobs.

447 jobs found

1 - 20 of 447 Jobs
Apply
companyMongoDB, Inc. logo
Full-time|Hybrid|Dublin

The Team The Storage Layer Services (SLS) team at MongoDB is pioneering the re-architecture of our cloud storage layer, fundamentally enhancing the core of our next-generation cloud storage architecture. This innovative team is dedicated to developing high-performance, multi-tenant distributed storage services that elevate the current Atlas storage stack and facilitate the efficient execution of diverse customer workloads. As a member of this team, you will collaborate closely with engineers responsible for building these storage services. Your role will involve defining Service Level Objectives (SLOs), shaping capacity plans, and ensuring the reliability, durability, and operational safety of the storage layer that supports Atlas. You will be part of a select group of senior Site Reliability Engineers (SREs), playing a vital role in the execution of a strategic multi-year roadmap for MongoDB's cloud storage architecture. We are particularly eager to connect with candidates located in Dublin, as this role follows a hybrid working model.

Apr 10, 2026
Apply
companyKlaviyo logo
On-site|On-site|Dublin, IE

Join Klaviyo as a Site Reliability Engineer II in Dublin, where you'll play a pivotal role in ensuring the reliability, scalability, and sustainability of our critical platforms. Our approach treats reliability as a core product feature, leveraging your engineering skills to tackle complex operational challenges. You'll collaborate with a dynamic team to enhance our infrastructure, security, and software engineering practices, ensuring our systems perform optimally at scale. Your contributions will directly influence how our engineering teams build software and how our customers engage with our platform daily.

Jan 31, 2026
Apply
companyMongoDB, Inc. logo
Full-time|Hybrid|Dublin

MongoDB, Inc. supports organizations as they build and operate modern applications. The company’s flagship product, MongoDB Atlas, is a multi-cloud database platform available across AWS, Google Cloud, and Microsoft Azure in more than 115 regions. Atlas enables customers to run applications both on-premises and in the cloud. Each month, over 175,000 new developers join the MongoDB community. Companies such as Samsung and Toyota rely on MongoDB for next-generation, AI-driven applications. Role overview The Site Reliability Engineer III joins a team responsible for designing and maintaining the infrastructure that powers MongoDB services, with a particular focus on the Atlas platform. As customer requirements and regulations change, the SRE team works to deliver low-latency responses and address data sovereignty needs. The goal is to build complex systems that are reliable, straightforward to operate, and easy to monitor. Infrastructure-as-code and self-healing systems are core values for the team. Collaboration with other engineering groups is a regular part of the role, ensuring shared knowledge and responsibility for system health. Location This position is based in Dublin and follows a hybrid work model.

Apr 21, 2026
Apply
companyStepStone logo
Full-time|On-site|Dublin

Join StepStone as a Site Reliability Engineer and play a critical role in ensuring the stability and performance of our innovative platforms. In this position, you will collaborate with cross-functional teams to enhance system reliability, improve the scalability of our applications, and automate operations processes. Your expertise in monitoring, incident response, and cloud technologies will be invaluable as you work on enhancing our infrastructure and delivering top-notch solutions.

Apr 10, 2026
Apply
companyairapps logo
Full-time|On-site|Dublin

airapps is looking for a Site Reliability Engineer (SRE) in Dublin. This role centers on keeping services reliable, available, and performing well. Working side by side with software development teams, the SRE will help strengthen system architecture and support ongoing improvements. Role overview The Site Reliability Engineer focuses on supporting the stability and efficiency of airapps’ systems. The position involves regular collaboration with developers to address system challenges and refine processes. Key responsibilities Monitor and maintain the reliability and uptime of core services Work with development teams to improve system design and architecture Apply new technologies and methods to boost operational efficiency Location This position is based in Dublin.

Apr 28, 2026
Apply
companyArista Networks logo
Full-time|On-site|Dublin

Join Arista Networks as a Senior Site Reliability Engineer, where you will play a crucial role in ensuring the reliability, performance, and scalability of our systems. You will collaborate with cross-functional teams to implement best practices in software development and operational excellence.

Apr 1, 2026
Apply
companyArista Networks logo
Full-time|On-site|Dublin

Collaboration and Innovation Await YouJoin Arista Networks as a talented Site Reliability Engineer within our Engineering Productivity (EngProd) team, where you will play a crucial role in maintaining and enhancing our rapidly expanding infrastructure. We seek a versatile and adaptable professional who is eager to explore new technologies. As part of our software engineering team, you will collaborate with peers to design, build, and manage secure, scalable, and fault-tolerant tools and infrastructure in a hybrid cloud environment.In the EngProd group, you will engage with fellow engineers to architect, scale, and operate the systems that support Arista’s product development teams. Our technology stack includes industry standards such as Ansible, Artifactory, Gerrit, Jenkins, Kubernetes, Grafana, Spinnaker, MySQL, ElasticSearch, Google Cloud, Varnish, and Perforce, alongside custom-built internal systems designed to automate CI/CD, testing, analysis, and visualization.Your ResponsibilitiesSafely and incrementally build, deploy, and manage critical production systems with an emphasis on scalability, reliability, observability, performance, and security.Enhance and monitor the developer experience across various services.Automate processes to eliminate toil and enhance operational efficiency of production systems.Proactively monitor and respond to alerts while setting up automated alert handling mechanisms.Develop and maintain incident response runbooks.Triage platform and infrastructural issues, assisting Arista software engineers and collaborating with third-party vendor support.Document postmortems and create solutions to prevent recurring incidents.Communicate and plan maintenance windows for production systems.Work closely with Arista’s product development teams to identify and resolve infrastructural bottlenecks affecting their workflows.Research and implement best practices around infrastructure and platforms to ensure secure, scalable, and fault-tolerant systems.Analyze and understand the design and implementation details of open-source systems to improve triage and resolution processes.

Mar 12, 2026
Apply
companyCrusoe logo
Full-time|On-site|Dublin - IE

Crusoe is on a mission to revolutionize the way we access and utilize energy and intelligence. We are building the infrastructure that empowers a future where ambitious AI-driven projects can thrive without compromising on scale, speed, or sustainability.Join us at Crusoe and be part of the AI revolution through sustainable technology. Here, you will spearhead significant innovations, create a lasting impact, and collaborate with a team committed to delivering responsible and transformative cloud infrastructure.About This Role:As a Site Reliability Engineer (SRE) at Crusoe, you will be integral in maintaining the reliability and performance of our cutting-edge infrastructure. Our SRE team focuses on identifying, analyzing, and mitigating issues to uphold high Service Level Agreements (SLAs) through effective Service Level Indicators (SLIs) and Service Level Objectives (SLOs). By automating processes and proactively addressing potential problems, you will help ensure that our systems run seamlessly, advising engineering teams on best practices for resilient coding. Your role will involve anticipating issues before they affect our customers, conducting comprehensive post-mortems, and promoting continuous improvement to uphold the highest reliability standards for Crusoe's AI platform. The ideal candidate possesses a solid foundation in SRE practices, distributed systems, networking, and Linux, along with a passion for automation and problem-solving. This is a full-time position.What You’ll Be Working On:Automation and Tool Development: Streamline routine processes and enhance Crusoe’s internal infrastructure platform, allowing software teams to operate effectively without needing in-depth knowledge of the operating system, hardware, or network.Collaboration and Planning: Engage in daily stand-up meetings with the team to review projects, recent incidents, and daily priorities. Collaborate on strategies for launching new data centers or upgrading existing ones. Work closely with software engineers to ensure the adoption of resilient coding practices and review modifications prior to deployment.System Monitoring and Alerting: Analyze overnight alerts and performance metrics to guarantee optimal system operation. Evaluate system logs and develop innovative tools to enhance our monitoring capabilities.Incident Response and Problem Solving: Participate in incident response simulations, post-mortems, and root cause analysis sessions to extract valuable lessons from past issues.

Jan 14, 2026
Apply
companyTenable, Inc. logo
Full-time|On-site|Ireland - Office - Dublin

About Tenable Tenable is a global leader in Exposure Management, trusted by over 44,000 organizations to help understand and reduce cyber risk. The company supports 65% of the Fortune 500, 45% of the Global 2000, and many government agencies. Team and Culture Tenable’s people are at the heart of its success. Teams work together to build cybersecurity solutions and maintain a culture rooted in respect and excellence. Employees collaborate with industry experts and have the tools and support to make a measurable difference. Role Overview: Senior Site Reliability Engineer This Dublin-based role sits within the SRE Infrastructure Management team. The team’s mission is to keep Tenable’s cloud-centric exposure management platform reliable, scalable, and secure. The focus is on reducing manual operational work by building advanced automation, especially using AI. What You Will Do Design and build AI-powered agentic workflows to automate complex SRE tasks, including incident investigation and deployment reliability. Develop evaluation frameworks, prompt engineering methods, retrieval strategies, and structured output validation to improve the accuracy and observability of agent pipelines. Write production code, create agentic workflows, and integrate observability and infrastructure platforms. Analyze the impact of automation efforts using real toil data. What Sets This Role Apart This position is not limited to operations with minor automation. Most of the work involves hands-on development: designing, coding, and deploying intelligent systems that replace manual SRE workflows. The team uses large language models, agentic architectures, and deep SRE knowledge to drive results. Location Office-based in Dublin, Ireland.

Apr 20, 2026
Apply
companyAnthropic logo
On-site|On-site|Dublin, IE

About AnthropicAt Anthropic, we are on a mission to develop AI systems that are not only reliable and interpretable but also steerable. Our primary goal is to ensure that AI technology is safe and advantageous for all users and society at large. Our rapidly expanding team consists of dedicated researchers, engineers, policy experts, and business leaders, all working collaboratively to create beneficial AI solutions.Role OverviewAt Anthropic, we believe in the strength of collaboration. Our AI Reliability Engineering (AIRE) team plays a crucial role in maintaining the robustness of Claude, our flagship AI, ensuring it remains reliable for everyone who relies on it. We work closely with various teams within Anthropic to enhance reliability across our essential service paths—from the SDK, through our network, API layers, serving infrastructure, and accelerators, and back again. Our hands-on approach allows us to make impactful improvements during incidents and in collaborative projects.Reliability is an emergent quality that extends beyond individual teams. Our role involves taking a comprehensive view of the systems, offering a unique opportunity for dynamic, cross-functional engagement with the most critical aspects of our operations.

Feb 9, 2026
Apply
companyCrusoe logo
Full-time|On-site|Dublin - IE

At Crusoe, we are on a mission to drive the future of energy and intelligence. Our innovative platform empowers individuals to harness the full potential of artificial intelligence without compromising on scalability, speed, or sustainability.Join the forefront of the AI revolution with Crusoe's sustainable technology. Here, you'll be instrumental in pioneering transformative innovations, making a significant impact, and collaborating with a team that is redefining responsible cloud infrastructure.About the Role:As a Software Engineering Intern, you will be part of a dedicated team shaping the future of distributed systems technology. This 12-week, full-time internship in our Dublin office offers a unique opportunity to contribute to the development of a robust cloud infrastructure that supports groundbreaking advancements in fields such as artificial intelligence, graphics rendering, and computational biology. You won't just observe; you'll take on real responsibilities, tackle production-level challenges, and play a key role in Crusoe's vision for sustainable and ethical high-performance computing.Throughout your internship, you will engage in impactful projects that extend beyond traditional classroom learning. Benefit from one-on-one mentorship from industry veterans and collaborate with a diverse group of engineers to construct fault-tolerant systems utilized by customers across the globe. We are looking for motivated, inquisitive, and proactive students ready to forge valuable connections and launch their careers by addressing today's most challenging computational problems.Your ResponsibilitiesSystem Development: Design, implement, and maintain scalable, highly available, and fault-tolerant distributed systems to support demanding computational workloads.Product Development: Innovate and create cutting-edge products and tools from inception that will be leveraged by a global user base.Production Support: Identify, troubleshoot, and resolve complex issues in production environments to maintain platform reliability.Feature Development: Collaborate with product owners and stakeholders to design, test, and iterate on new features that enhance platform capabilities.Team Collaboration: Work closely with senior engineers and peers to ensure technical tasks align with broader organizational objectives.Mentorship Opportunities: Engage in dedicated mentorship sessions to accelerate your growth and deepen your technical expertise.

Jan 29, 2026
Apply
companyVeeva Systems Inc. logo
Full-time|Hybrid|Ireland - Dublin

Veeva Systems is a purpose-driven leader in cloud solutions for the life sciences industry, dedicated to accelerating the delivery of therapies to patients. As one of the fastest-growing SaaS companies globally, we achieved over $2 billion in revenue last year and are poised for continued growth.Our core values—Do the Right Thing, Customer Success, Employee Success, and Speed—guide our operations. We made history in 2021 by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.At Veeva, we embrace flexibility through our Work Anywhere philosophy, enabling you to thrive in your preferred work environment—whether from home or in the office.Be a part of our mission to transform the life sciences sector, making a meaningful impact on our customers, employees, and communities.The Role We are looking for a Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be responsible for maintaining the scalability and reliability of our enterprise applications, addressing complex challenges on a global scale. Your expertise in Java and modern open-source technologies will be critical in enhancing our production systems.The ideal candidate will possess a wealth of experience with Java applications and the latest open-source technologies, ideally gained from enterprise software development or a rapidly growing tech environment. As a Senior SRE, you should be innately curious and proficient in problem-solving. You will also offer a unique engineering perspective, understanding how systems integrate to function effectively for hundreds of customers across North America, Europe, and Asia.

Aug 10, 2021
Apply
companyMongoDB, Inc. logo
Full-time|On-site|Dublin

Role Overview MongoDB is hiring a Team Lead for Site Reliability Engineering, with a focus on the Storage Layer Service. This position is based in Dublin. What You Will Do Lead efforts to improve the reliability and performance of the Storage Layer Service. Work closely with teams across the company to deliver solutions that support both user experience and operational goals. Guide and support engineers as they address technical challenges in the storage layer. Collaboration This role involves regular collaboration with other engineering groups and stakeholders to identify opportunities for improvement and implement changes that make a measurable impact.

Apr 15, 2026
Apply
companyInterSystems logo
Full-time|Remote|Dublin (Remote)

Overview Join our dynamic Managed Services team as a Major Incident Lead specializing in Site Reliability. In this critical role, you will spearhead the response to significant, customer-impacting incidents across InterSystems’ managed services platforms. As the Incident Commander, you will ensure swift service restoration, maintain clear and confident communication with stakeholders, and coordinate effectively across SRE, engineering, support, cloud, and service delivery teams. Operating within a service model aligned with SRE principles, you will prioritize service reliability by leveraging service level indicators and objectives, focusing on reducing customer impact during live incidents over root cause analysis. Beyond immediate incident management, you will lead post-incident reviews to transform operational failures into actionable reliability enhancements and minimize repeat incidents. This position is vital for preserving customer trust, ensuring platform resilience, and achieving operational excellence in a 24x7, mission-critical, and highly regulated environment.

Mar 26, 2026
Apply
company
Site Engineer

XYZ Reality

Full-time|On-site|Dublin, Ireland

About XYZ RealityXYZ Reality is at the forefront of innovation, offering the world's first engineering-grade Augmented Reality solution specifically designed for the construction industry. Our groundbreaking technology integrates seamlessly into The Atom, a smart, site-safe headset/hardhat, enabling us to implement AR solutions that enhance project delivery while adhering to timelines and budget constraints.With a rapidly expanding team of over 100 professionals across the UK, US, and Europe, we partner with critical organizations and construction firms to realize major projects successfully.Role OverviewAs a Site Engineer at XYZ Reality, you will play a pivotal role in executing our core services on construction projects. Your responsibilities will include monitoring construction progress in relation to BIM models, conducting quality inspections on-site, and delivering findings to clients through our innovative platform.This position is ideal for individuals with hands-on construction experience who are eager to embrace XYZ Reality’s advanced technology and methodologies.

Mar 31, 2026
Apply
company
Full-time|Hybrid|Dublin, County Dublin, Ireland

At Starling Bank, we are on a transformative mission to redefine the banking experience. As the UK’s first digital bank, our vision centers around leveraging cutting-edge technology to deliver fast, fair, and transparent banking services that empower our customers to manage their finances effortlessly.Our organization marries the core principles of being a fully licensed bank with the dynamic pace of a tech innovator. With a workforce of over 3,000 professionals across our offices in London, Southampton, Cardiff, and Manchester, we emphasize a culture that fosters innovation, collaboration, and ownership.As a Database Reliability Engineer, you will be integral to our tech team, contributing to a work environment that encourages creativity and the use of advanced technologies. Your role will encompass building, optimizing, and maintaining reliable database systems that are crucial for our banking operations.We believe in a flat organizational structure that empowers every team member to make impactful decisions. Our core values—Listen, Keep It Simple, Do The Right Thing, Own It, and Aim For Greatness—guide our mission to create a better banking experience.Hybrid WorkingOur hybrid working model encourages collaboration while allowing flexibility, requiring attendance at the office at least once a week.Data EnvironmentOur Data teams work across various divisions, focusing on delivering insights that positively impact our business and customers. We invite talented data professionals at all levels to be part of our journey.

Apr 8, 2026
Apply
companyIntersystems logo
Full-time|Remote|Dublin (Remote)

Overview We are looking for a skilled Kubernetes Engineer to become a vital part of our global infrastructure team. In this role, you will play an essential part in scaling, automating, and securing our container orchestration environments across both on-premises and public cloud platforms. As a Kubernetes expert, you will collaborate closely with DevOps, Site Reliability Engineering (SRE), and security teams to deliver dependable, self-service, and production-ready Kubernetes clusters that support our mission-critical applications. Key Responsibilities Cluster Management Deploy, manage, and upgrade Kubernetes clusters utilizing tools like kubeadm, EKS, AKS, GKE, or Rancher. Implement comprehensive RBAC, network policies, ingress controllers, and security frameworks within Kubernetes. Automation and Infrastructure as Code (IaC) Automate cluster provisioning and application deployment pipelines using technologies such as Terraform, Helm, and ArgoCD. Create reusable modules to ensure consistent infrastructure delivery across staging and production environments. CI/CD Integration Integrate Kubernetes within modern CI/CD workflows to enable rapid and secure application delivery. Promote GitOps practices and automate continuous deployment. Monitoring, Logging, and Troubleshooting Establish observability for Kubernetes using tools like Prometheus, Grafana, Loki, and Fluentd/Fluent Bit. Troubleshoot performance issues, failed pods, memory leaks, and cluster degradation events. Cloud and Hybrid Deployments Manage Kubernetes workloads across AWS, Azure, GCP, and hybrid/on-premise environments. Utilize tools like Velero, Kasten, or Stash for backup and restore strategies in Kubernetes. Collaboration and Support Collaborate with application developers, SREs, and security teams to implement best practices. Act as a technical advisor on cloud-native architectures and containerization.

Mar 26, 2026
Apply
companyCoreWeave logo
Full-time|On-site|Dublin, Ireland

CoreWeave is at the forefront of AI infrastructure, providing the essential cloud computing services tailored for innovators. Our platform equips AI pioneers with the necessary technology, tools, and expert teams to confidently build and scale their AI solutions. Trusted by top AI labs, startups, and global enterprises, CoreWeave combines unparalleled infrastructure performance with extensive technical expertise to drive breakthroughs and transform compute capabilities. Established in 2017, CoreWeave made its public debut on Nasdaq (CRWV) in March 2025. Discover more at www.coreweave.com. We take pride in being a Living Wage accredited Employer. Your RoleThe Fleet Reliability Operations Team serves as the core of CoreWeave’s capacity delivery and maintenance initiatives. This team is tasked with provisioning, updating, and managing server nodes, along with executing the processes and tools that configure and validate our server fleet. As the first responders to hardware issues in production, this team is empowered to drive automation and observability design throughout our server fleet lifecycle.We are on the lookout for an Operations Engineering Manager to join the Fleet Reliability Operations team. This role will be pivotal in maintaining and enhancing our delivery volume as we expand our fleet tenfold. You will cultivate a robust talent pipeline, oversee onboarding and training, provide leadership in processes, and advocate for reliability and customer satisfaction. As the manager of this team, you will have the chance to:Establish and lead a 24/7 team of process-oriented engineers focused on reliability and observability.Facilitate the development and documentation of clear, consistent processes for provisioning, validating, and troubleshooting nodes in our server fleet.Critically assess and champion process and automation improvements, prioritizing event-driven automated remediation.Provide a 24/7 engineering support function for critical, time-sensitive node delivery and maintenance.Enhance our onboarding, documentation, enablement, and performance management programs to elevate team members' growth and capabilities.Foster a culture of accountability and performance measurement within your team.

Apr 3, 2026
Apply
companyStripe, Inc. logo
Full-time|On-site|Dublin

Join Stripe as a Staff Engineer in our Production Engineering team, where you will play a critical role in building and maintaining scalable systems.As a leader in this innovative environment, you will collaborate with cross-functional teams to optimize our infrastructure and enhance the reliability of our services. You will leverage your expertise to influence product design and architecture decisions, ensuring the highest performance and availability.

Mar 16, 2026
Apply
companySeapoint logo
Full-time|Hybrid|Dublin, County Dublin, Ireland

About SeapointSeapoint is revolutionizing the financial landscape for European startups and scale-ups. Our innovative, AI-driven business account streamlines everything from payroll and expenses to invoice payments and reporting, all centralized in one platform.Founded by Sean Mullaney, the former European CIO at Stripe, alongside a talented team of alumni from renowned companies such as Stripe, Wise, Wayflyer, Nubank, and Tide, we have successfully secured $3M in pre-seed funding led by Frontline Ventures. After nine months of dedicated development, we are now in private beta, collaborating with numerous VC-backed startups that are finally gaining the comprehensive financial insights they've been seeking.We are tackling a genuine challenge: European startups often find themselves caught between the inadequacies of neobanks and the overwhelming complexity of traditional corporate banking. Many are forced to manage 4-6 different financial products manually while earning no interest on their cash deposits. Our solution employs AI automation to alleviate this burden, allowing founders to devote more time to growing their businesses instead of managing spreadsheets.Explore more about our vision here.About the RoleHelp define the technical framework of startup finance.As a Staff Engineer, you will spearhead architectural decisions across our platform, mentor engineering teams, and tackle the toughest technical challenges we encounter. Whether it’s scaling our multi-tenant financial infrastructure or developing integrations with banks and other fintech partners, you will play a crucial role in ensuring reliability at scale.We are seeking leaders from diverse backgrounds who are enthusiastic about creating innovative solutions. You will collaborate with founders and senior engineers experienced in building financial infrastructure at scale, addressing the unique challenges that startups face.While we primarily use a modern stack featuring Typescript, bun, PostgreSQL, and various AWS services, we value curiosity and are always open to exploring new software development methodologies and AI tools.RequirementsKey Qualities: Technical expertise, system design proficiency, and the capability to convert complex problems into elegant solutions that facilitate growth for thousands of startups.If you're eager to create the infrastructure that empowers the next generation of companies, we want to hear from you.

Sep 23, 2025

Sign in to browse more jobs

Create account — see all 447 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.