1 - 20 of 1,376 Jobs

Search for Site Reliability Engineer II - Platform Security

1,376 results

Apply
Elastic N.V. logo
Full-time|Remote|Spain

Join Elastic as a Site Reliability Engineer II specializing in Platform Security, where you will play a crucial role in ensuring the reliability and security of our innovative platform. Your expertise will be instrumental in maintaining the performance and resilience of our systems, working alongside a talented team of engineers. You will be responsible for …

Mar 27, 2026
Apply
N26 logo
Full-time|On-site|Barcelona

N26 is looking for a Site Reliability Engineer to join the Platform Engineering team in Barcelona. This role centers on supporting the reliability and performance of the company’s banking platform. Role overview As a Site Reliability Engineer, the main focus is on developing, improving, and maintaining platform systems. Daily work involves collaborating with colleagues from various teams to keep systems stable and efficient. Key responsibilities Work closely with cross-functional teams to support platform operations Enhance and maintain system reliability and performance Implement monitoring solutions and automation tools Participate in incident response and help drive improvements Location This position is based in Barcelona.

Apr 29, 2026
Apply
Landbot logo
Full-time|Remote|Remote job

Full time.We are excited to announce an opportunity for a Senior Site Reliability Engineer to join our dynamic team at Landbot. This is a remote position, and we are specifically looking for candidates located between UTC-1 to UTC +2.About LandbotOperating in over 150 countries, Landbot provides an innovative platform that enables businesses to craft outstanding chatbot and AI agent interactions across various channels, including Web, WhatsApp, and Messenger. We are passionate about delivering exceptional customer experiences and are committed to engineering excellence.At Landbot, we foster a high-performance culture that merges engineering expertise with a product-focused mindset and a dedication to customer satisfaction. We believe that quality and speed are essential for success, and we are looking for a Senior Reliability Engineer to help us enhance our platform and drive meaningful results.About the TeamAs part of our Platform Engineering team, you will collaborate with a small, dedicated group responsible for the development and maintenance of the Landbot Engineering Platform, Data Platform, and Security frameworks. Our mission is to empower Landbot teams to deliver value efficiently, reliably, and at scale.We value:A product-oriented approach to platform developmentAutonomy and accountabilityCollaborative efforts over bureaucratic barriersAbout the PositionYour Role As a Senior Reliability Engineer, you will embody the principles of Systems Engineering within a Platform team, treating infrastructure as a product. Your focus will be on addressing developer needs, minimizing operational challenges, and creating self-service capabilities that simplify processes, allowing teams to concentrate on feature development.Key Responsibilities:Develop and Maintain the Internal Developer PlatformDesign and implement essential platform services, including CI/CD pipelines, infrastructure provisioning, and observability systems.Create developer-facing tools, APIs, and automation that empower application teams to independently deploy, scale, and manage services.Manage and Enhance Platform OperationsOptimize cloud resources, Kubernetes clusters, databases, and networking for enhanced reliability, scalability, and cost-effectiveness.Establish SLIs, SLOs, and error budgets to ensure a balance between reliability and feature velocity.Design and implement observability solutions for real-time monitoring and proactive issue resolution.Develop alerting strategies that minimize noise and highlight actionable insights.Lead incident responses, conduct blameless postmortems, and drive continuous enhancements.Improve Developer Experience and Influence Platform StrategyCollaborate with application teams to understand their workflows and challenges, gather feedback, and prioritize enhancements that align with business goals.Create and maintain comprehensive documentation, runbooks, and knowledge bases to support teams effectively.

Dec 2, 2025
Apply
Hopper logo
Full-time|Remote|Spain - Remote

About the OpportunityJoin Hopper's Cloud FinOps team as a Senior Site Reliability Engineer, where you will play a key role in managing our extensive infrastructure in Google Cloud. This infrastructure supports hundreds of engineers and delivers an exceptional experience to millions of users globally.Your passion for automation and system optimization will shine as you ensure that our infrastructure is scalable, reliable, secure, and efficient.You will approach problem-solving practically, creating solutions that are straightforward, dependable, cost-effective, and user-friendly.Daily ResponsibilitiesEngage in projects aimed at enhancing cost efficiency, such as:Minimizing network egress costs by eliminating unnecessary headers.Optimizing warehouse data usage and selecting the most efficient storage solutions, including cold storage for infrequently accessed buckets.Ensuring effective autoscaling for both databases and compute resources.Enhance current cost attribution methods to provide all teams with clear visibility into their expenditures.Participate in incident support and be part of the on-call rotation for platform incidents, collaborating with teams across America and Europe to ensure smooth operations.Contribute to solving engineer inquiries regarding our infrastructure and approving pull requests that require platform oversight.Be an integral part of a small, high-performing team of Site Reliability Engineers.Ideal Candidate ProfileExtensive experience in SRE, DevOps, Software Engineering, or Systems Engineering.Exceptional troubleshooting skills.Strong system design capabilities with analytical prowess.Excellent communication skills.Familiarity with major cloud platforms, particularly Google Cloud.Proficiency in SQL.Experience with containers, Kubernetes, and related tools such as Kustomize and Helm.Knowledge of Service Mesh, preferably with Istio.Understanding of networking concepts including DNS, TLS, certificates, and ingress configurations.

Mar 5, 2026
Apply
Cabify logo
Full-time|Remote|Remote or Madrid (HQ)

We are seeking a talented Site Reliability Engineer to join our dynamic team at Cabify. In this role, you will be pivotal in ensuring the reliability, availability, and performance of our services. You will work closely with development and operations teams to implement best practices and innovative solutions that enhance our infrastructure.As a Site Reliability Engineer, your responsibilities will include monitoring system performance, automating processes, and troubleshooting issues to maintain optimal service levels. Your expertise will contribute to improving our platform's resilience and scalability.

Mar 16, 2026
Apply
Betsson logo
Full-time|On-site|Malaga

Role overview Betsson is hiring a Site Reliability Engineer to support the stability, reliability, and performance of its flagship online casino platform. This position plays a key part in ensuring gaming services remain available and responsive for players worldwide. The role sits within Betsson’s Product Development group, a team of nearly 600 professionals working across six tech hubs: Malta, Budapest, Stockholm, Tallinn, Kyiv, and Athens. The team values close collaboration and operates under the guidance of the CTO and CPO. What you will do Incident and problem management: Investigate system incidents, lead root cause analysis, and implement long-term solutions. Work to minimize incidents related to system changes. Observability and metrics: Define and maintain SLAs, SLOs, and success metrics for new projects. Build and update dashboards to enhance system observability. Performance and capacity: Identify and help resolve performance bottlenecks. Tune infrastructure and code for speed, and plan for future hardware or cloud resource needs. Availability and change management: Maintain high availability and functionality of platform components. Oversee deployments to ensure new releases do not disrupt existing systems. Requirements The ideal candidate brings strong support and troubleshooting skills, with experience in most of these areas: Observability and monitoring: Building dashboards and tracking SLAs/SLOs with tools such as Prometheus, Grafana, Coralogix, Splunk, or Loki. Programming and automation: Scripting and coding to automate tasks and develop reliability tools. Skills in .NET, Python, PowerShell, or Bash are considered a plus. Infrastructure as code and cloud: Managing infrastructure using Terraform or Ansible, and understanding cloud platforms like AWS, GCP, or Azure. Containerization and orchestration: Experience scaling and managing distributed systems with Kubernetes and Docker. CI/CD and change management: Familiarity with continuous integration and delivery to support smooth deployments. Location: Malaga

Apr 27, 2026
Apply
Okta, Inc. logo
On-site|On-site|Barcelona, Spain

Discover OktaAt Okta, we are redefining identity management as The World’s Identity Company. Our mission is to empower individuals to securely access any technology, from anywhere, on any device or application. Our innovative solutions, including the Okta and Auth0 platforms, provide robust access management, authentication, and automation, placing identity at the forefront of business security and growth.We embrace diverse perspectives and experiences, and we’re not just looking for candidates who fit a specific mold; we seek lifelong learners who can enrich our team with their unique backgrounds and insights.Join us in our vision of a world where identity is truly yours.As a Senior Site Reliability Engineer (SRE) at Auth0, you will play a pivotal role in delivering an unmatched authentication experience to millions of users worldwide. Our commitment to reliability is fundamental to our product, and your expertise will be crucial in enhancing the availability and resilience of our systems. You will be part of our European SRE team, ensuring that our production environments are not only operational but also scalable and prepared for rapid growth. This role goes beyond maintaining systems; it’s about designing solutions that fundamentally improve our platform's resilience and performance.

Jan 28, 2026
Apply
Fever logo
Full-time|On-site|Spain

Greetings! We’re Fever, the foremost technology platform dedicated to transforming the culture and live entertainment landscape.Our vision? To make culture and entertainment universally accessible. With our proprietary innovative technology and analytical approach, we’re reshaping how individuals connect with live events. Every month, we inspire over 300 million users across more than 40 countries (and still growing) to unearth unforgettable experiences while simultaneously equipping event creators with our technology and insights, enabling them to innovate, grow, and connect with new audiences.Our achievements? We’ve partnered with industry giants like Netflix, F.C. Barcelona, and Primavera Sound, presented internationally acclaimed experiences, and are supported by leading global investors! Quite impressive, right? To fulfill our mission, we seek high achievers with a proactive mindset, eager to help redefine the future of entertainment!Are you ready to join the experience?Let’s explore this role and how you can contribute to Fever’s mission.Behind the seamless iOS and Android applications and our global website lies our engineering team. We are responsible for creating, developing, enhancing, and maintaining all Fever services to ensure more people enjoy incredible experiences.About the RoleAs a Site Reliability Engineer / DevOps Engineer at Fever, you will design, implement, and maintain scalable and reliable infrastructures, with a strong emphasis on automation.If you have experience in developing infrastructures using AWS services, possess solid knowledge of building CI/CD pipelines, and approach system architecture with a security-first mindset, you are exactly who we need!Join us if you thrive in a dynamic environment and are passionate about pushing the limits of what’s achievable. This is a chance to make a significant impact in a rapidly growing global leader.

Feb 24, 2026
Apply
Elastic NV logo
Full-time|Remote|Spain

Join Elastic as a Principal Site Reliability Engineer - Observability and play a pivotal role in enhancing the resilience and scalability of our platforms. In this leadership position, you'll be responsible for developing and implementing strategies to optimize system performance, reliability, and observability. You will collaborate with cross-functional teams to ensure our infrastructure supports the growing demands of our users while maintaining high standards of security and compliance.

Apr 1, 2026
Apply
dLocal logo
Full Time|On-site|Spain

Why choose dLocal?At dLocal, we empower the world’s leading companies to seamlessly collect payments across 40 countries in emerging markets. Renowned brands trust us to enhance conversion rates and facilitate effortless payment expansion. As both a payment processor and a merchant of record in our operational regions, we enable our clients to penetrate the globe's fastest-growing markets.Join us and become part of an extraordinary global team that drives success. With over 1000 teammates from more than 30 nationalities, a career at dLocal means making a significant impact on millions of lives. We are innovators, unafraid of challenges, and dedicated to our customers. If this resonates with you, we believe you'll excel with us.What’s the opportunity?We are on the lookout for a dynamic Site Reliability Engineer (SRE) to enhance our team! In this role, you will focus on designing, implementing, and continuously maintaining our centralized observability platform, utilizing OpenTelemetry (OTEL) as the backbone. You'll collaborate with a talented team on mission-critical applications for major clients like Netflix, Amazon, Nike, Facebook, and more!As a Site Reliability Engineer, you will be tasked with exploring crucial questions:What data is necessary to gauge our system performance?How can we effectively gather this data?What patterns should we identify in the data, and what insights do they offer?Who needs to be alerted when a system falters?Are there systems requiring additional data?As an SRE, you'll design systems and processes that address these inquiries, providing automated support and responses wherever feasible.

Dec 11, 2023
Apply
Okta, Inc. logo
On-site|On-site|Barcelona, Spain

Discover OktaAt Okta, we are redefining identity management for the world. Our goal is to enable seamless and secure technology usage across any platform, device, or application. With our versatile solutions, the Okta Platform and Auth0 Platform, we prioritize secure access, robust authentication, and automation, ensuring that identity remains central to business security and advancement.We value diverse perspectives and experiences, seeking lifelong learners who can enhance our team with their unique insights. Join us in creating a world where identity is truly yours.As part of our mission, Auth0 delivers an unmatched authentication experience to hundreds of millions of users globally. Our commitment to reliability is foundational to our product, and we consistently strive to exceed customer availability expectations. As a mid-level Site Reliability Engineer (SRE) within our European team, your role will be pivotal in ensuring our production systems are operational, resilient, scalable, and prepared for exponential growth. This position is not just about maintaining systems; it's about enhancing the platform's core resiliency and robustness. You will actively contribute by crafting solutions that are designed to improve system reliability.

Jan 28, 2026
Apply
Tinybird logo
Full-time|Remote|Spain

Join Tinybird as a Site Reliability Engineer!At Tinybird, we empower developers and data teams to harness the full potential of real-time data. Our platform enables you to seamlessly build data pipelines and create innovative data products with unmatched speed. With our user-friendly interface, you can easily ingest multiple data sources, manipulate them using the SQL you already know, and publish low-latency, high-concurrency APIs that enhance your applications. Experience a transformative approach to API development where tasks that used to take hours can now be completed in mere minutes! Tinybird is the go-to tool for data engineers and software developers aiming to drive innovation effortlessly.

Jun 10, 2025
Apply
Nexthink logo
Full-time|On-site|Madrid

Role overview Nexthink is looking for a Senior Site Reliability Engineer in Madrid to help strengthen the reliability and performance of our systems. This role works closely with engineering teams to build and support scalable infrastructure. Automation and process improvement are key parts of the work, all aimed at delivering a smooth user experience.

Apr 14, 2026
Apply
Affirm logo
Full-time|Remote|Remote Spain

At Affirm, we're redefining credit to create a more transparent and user-friendly experience for consumers, allowing them to buy now and pay later without unexpected fees or interest.The Site Reliability Engineering (SRE) team at Affirm is a pivotal group dedicated to empowering our engineering partners to excel in the ownership of their applications and services, ensuring a seamless experience for our customers. We achieve this by establishing best practices for application operations, developing essential tools, and offering training and consultation services. Key responsibilities of the SRE team include:Delivering insights and visibility to teams and leadership regarding application performance.Leading the development and implementation of Service Level Objectives (SLOs).Overseeing the incident management and analysis processes.Guiding the adoption of change management and deployment practices.Participating in architectural and service discussions.Advising on observability and alerting configurations.Our SRE team boasts expertise across various domains, including:Infrastructure, platforms, and distributed systems.Capacity management, load testing, and chaos engineering.Automation, observability, and configuration management.Development practices and product lifecycle experience.We are looking for passionate software and systems engineers ready to enhance incident lifecycle, reliability, and resilience practices across Affirm's engineering organization and beyond.Your Responsibilities:Take charge of quarterly goals for your team, guiding engineers through uncertainties to tackle complex challenges while ensuring support throughout the delivery process.Collaborate with peers and stakeholders in the product development cycle, working closely with infrastructure, product management, developer experience, and analytics to ideate, clarify technical constraints, and make decisions that appropriately balance risks and trade-offs.Proactively identify and implement technical solutions and operational processes that enhance incident preparedness, response, and analysis.Support the operations and reliability of your team's artifacts by creating and tracking metrics, escalating issues as necessary, and contributing to on-call and maintenance efforts.Cultivate a culture of quality and ownership within your team by establishing or refining code review and design standards, and advocating for best practices.

Jan 19, 2026
Apply
Perk logo
Full-time|On-site|Barcelona

About UsPerk, formerly known as TravelPerk, stands at the forefront of travel and expense management innovation. Our platform is meticulously designed to alleviate the burdens of tedious manual tasks, facilitating seamless automation across travel bookings, expense management, invoice processing, and beyond. By addressing this shadow work that detracts from productivity and stifles creativity, we aim to empower professionals to focus on what truly matters.We take pride in serving over 10,000 companies globally, including industry leaders like Wise, On Running, Breitling, and Fabletics. Our mission is to tackle the staggering problem of productivity loss, which can average up to 7 hours per employee each week, equating to a substantial $1.7 trillion challenge.Established in 2015, Perk has rapidly expanded into a dynamic global workforce of more than 1,800 professionals spread across 12 offices, with our primary locations in London and Boston. We blend innovation, governance, and simplicity to reshape workplace dynamics and enhance employee satisfaction.At Perk, our values inspire us: we act like owners, strive to provide a 7-star experience, and collaborate as one cohesive team. We cherish curiosity, purpose, and a growth mindset as we unlock individual potential. Our talent team comprises top-tier professionals from the travel and SaaS sectors, representing over 70 countries worldwide. If you are passionate about making a meaningful difference and transforming the work experience for millions, we invite you to join our team.Discover more about us at www.perk.com.The Platform function at Perk lays the groundwork for the entire organization. We manage the shared infrastructure, core services, and developer tools that empower product teams to operate efficiently without compromising system integrity. Although our contributions are often behind the scenes, their effects are ubiquitous: ensuring reliability, scalability, security, and improving the developer experience.We tackle complex, systemic issues, make intentional trade-offs, and take responsibility for the platform's long-term viability. If you value leverage, clean abstractions, and are eager to contribute to building a robust infrastructure, we would love to hear from you.

Apr 24, 2023
Apply
Affirm logo
Full-time|Remote|Remote Spain

Affirm is on a mission to transform the credit landscape, making it more transparent and user-friendly. We empower consumers with the flexibility to buy now and pay later, free from hidden fees and compounding interest.We are seeking a Senior Site Reliability Engineer to join our Cloud Compute team, playing a crucial role in maintaining the robust and scalable infrastructure that underpins our entire platform. As a fully remote team based in Spain, we manage all of Affirm's Kubernetes clusters. Our goal is to ensure a highly reliable and available cloud environment that enables our engineering teams to build and deploy innovative solutions with ease.You will be part of a close-knit, collaborative group that is passionate about automation and the implementation of best practices. Your contributions will be vital in enhancing our observability capabilities, strengthening the reliability of our critical infrastructure, and automating key operational workflows. This position offers a unique chance to influence the future of Affirm's cloud architecture, work with cutting-edge technologies, and significantly impact the stability and efficiency of our engineering organization. If you are a proactive and skilled cloud engineer who excels in a remote environment and is enthusiastic about Kubernetes, automation, and operational excellence, we want to hear from you!

Jan 27, 2026
Apply
WatchGuard Technologies, Inc. logo
Full-time|On-site|Madrid, Spain

Who You Are:You are a dedicated and customer-oriented developer who thrives on enhancing user experiences through data-driven solutions. Your enthusiasm for diagnosing and resolving production challenges drives you to proactively identify and eliminate issues.Your expertise encompasses cloud technologies, automation, infrastructure as code, networking, and microservices. You possess strong programming skills in Python, Java, or Go, with a keen desire to expand your knowledge further.You demonstrate a solid understanding of software engineering principles throughout the software development lifecycle, including coding standards, code reviews, security protocols, source control, build processes, automated testing, deployment, monitoring, chaos engineering, and self-healing operations. You are proficient with tools and technologies such as CloudFormation, Terraform, New Relic, AWS Lambda, Serverless architecture, Elasticsearch, Docker, Kubernetes, Spark, Flink, Jenkins, GitHub, Artifactory, and Jira.Your strong analytical and problem-solving skills, combined with effective verbal and written communication, enable you to lead production incident responses and postmortems successfully.What to Expect as a Member of the SRE Team at WatchGuard:The SRE team at WatchGuard is responsible for ensuring the reliability and security of our production cloud environments, collaborating closely with application development teams to deliver exceptional customer experiences. As you familiarize yourself with our systems, your responsibilities will include:Collaborating with development teams to ensure seamless production operations and managing large-scale event responses.Establishing operational and security policies, standards, and processes for development teams.Assisting development teams in defining, monitoring, and achieving their service level agreements through well-defined service level indicators and objectives.A Typical Day in the Life of a Site Reliability Developer on the SRE Team at WatchGuard:As a Site Reliability Developer at WatchGuard, your daily activities might involve:Working collaboratively with application teams in production environments across AWS, Azure, and hybrid cloud infrastructures to ensure effective monitoring, security, reliability, automation, and support.Promoting a culture of operational excellence through simplification, automation, analysis, and process evolution.Advocating for security and operational best practices to establish your reputation as a cloud expert among our diverse global development teams.Striving for the best possible customer experience, even during challenges, by actively participating in incident management and resolution efforts.

Jun 25, 2025
Apply
Air Apps logo
Full-time|On-site|Madrid

About Air Apps Air Apps builds tools to help people plan and manage their resources more effectively. Founded in 2018 in Lisbon, Portugal, the company is developing an AI-powered Personal & Entrepreneurial Resource Planner (PRP). With a strong family-oriented culture and a focus on innovation, Air Apps has reached over 100 million downloads worldwide. The team is committed to advancing AI solutions that make a real difference in daily life. Site Reliability Engineer (SRE) – Madrid This onsite role is based in Madrid. Air Apps offers relocation support for candidates moving to join the team. What You Will Do Design and build systems that are scalable, reliable, and fault-tolerant across cloud platforms. Develop and manage observability tools such as Prometheus, Grafana, Datadog, or ELK for monitoring, logging, and alerting. Automate infrastructure provisioning, deployments, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation. Improve system performance, scalability, and incident response processes to maximize uptime. Work closely with development and DevOps teams to strengthen system reliability. Lead root cause analysis (RCA) and put preventive measures in place to reduce failures. Maintain high availability by designing and supporting load balancing, failover, and disaster recovery strategies.

Apr 17, 2026
Apply
Docplanner logo
Full-time|Remote|Barcelona

Join the Positive Side of Technology! At Docplanner, known previously as Doctoralia, we’ve been revolutionizing healthcare for over a decade. Our journey began with a simple question: are patients being prioritized in healthcare? We answered that call by empowering patients to share their experiences and reviews, while equipping healthcare professionals with efficient technology for managing bookings and maximizing their time for patient care. Today, we invite you to contribute to this mission of making healthcare more human.Docplanner’s Global ImpactWith a presence in 13 countries and trust from over 90 million patients each month, Docplanner has become a leading choice among 300,000+ specialists and renowned investors like Point Nine Capital, Goldman Sachs Asset Management, and One Peak Partners. Despite employing over 2,500 individuals globally, we maintain the vibrant startup mentality that launched us over a decade ago.The Role of Technology in DocplannerAt Docplanner Tech, our diverse team of more than 400 professionals in Engineering, Data, and Product Development is dedicated to creating innovative solutions for our global users. Many team members have been with us for over five years, and we take pride in welcoming new talent with enthusiasm.Don’t just take our word for it—check out our Glassdoor reviews to see what it’s like to work with us. If you're curious about being your authentic self at work, watch this video.Why You Should Join UsYou’ll take pride in sharing with your friends and family how your work is making a meaningful difference in the world. Each day, you’ll go home knowing that your contributions truly matter and align with your personal values. Our goal is to enhance the human experience in healthcare, starting with the encouragement for you to bring your full self to work. We embrace diversity and foster an inclusive environment, allowing you the flexibility to work remotely if that’s your preference.

Apr 10, 2026
Apply
dLocal logo
Full-time|Remote|Madrid

Why Join dLocal?At dLocal, we empower leading global brands to facilitate payments across 40 countries in emerging markets. Our expertise helps businesses enhance conversion rates and simplifies their payment expansion journeys. Acting as both a payment processor and a merchant of record, we enable our clients to tap into the fastest-growing emerging markets.Joining our team means being part of a vibrant global family. With over 1500 colleagues from more than 30 nationalities, you’ll have the chance to develop an international career that impacts millions of lives daily. We are innovators, unafraid of challenges, and prioritize our customers. If this resonates with you, we believe you’ll excel in our collaborative environment.

Mar 12, 2026

Sign in to browse more jobs

Create account — see all 1,376 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.