Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Experience Level
Experience
Qualifications
About YouProven experience as an SRE or DevOps engineer with hands-on involvement in managing high-traffic production systems. Strong proficiency in Linux, and databases including MySQL, PostgreSQL, MongoDB, and Redis, along with a solid understanding of networking fundamentals. Familiarity with Kubernetes, CI/CD pipelines, and observability tools like Datadog. A self-motivated individual who excels in dynamic, scaling environments and can work independently without direct supervision. Pragmatic in approach, capable of balancing proactive maintenance with reactive troubleshooting.
About the job
Join Almedia, a pioneering company on a mission to revolutionize marketing by rewarding a community of over 60 million users for their engagement with global brands. Here, you can accelerate your career in an exciting environment aiming to become Germany's next bootstrapped unicorn, recognized as Europe's #3 fastest-growing company in 2025 (FT1000).
We are seeking a passionate and skilled Site Reliability Engineer / DevOps to help us maintain the performance and reliability of our high-traffic platform.
About Almedia
Almedia is redefining the future of marketing. By creating a platform that rewards user engagement, we are setting new standards for how businesses acquire customers. With a commitment to innovation and growth, we are on a path to significant achievements, including becoming a unicorn.
Join TechBiz Global as we empower our prestigious clients by providing exceptional recruitment services. We are currently on the lookout for a Founding DevOps Engineer (SRE) to become an integral part of our client's team. If you are eager to advance your career in a cutting-edge environment, this opportunity could be perfect for you.Berlin • Cybersecurity &…
Join Almedia, a pioneering company on a mission to revolutionize marketing by rewarding a community of over 60 million users for their engagement with global brands. Here, you can accelerate your career in an exciting environment aiming to become Germany's next bootstrapped unicorn, recognized as Europe's #3 fastest-growing company in 2025 (FT1000).We are seeking a passionate and skilled Site Reliability Engineer / DevOps to help us maintain the performance and reliability of our high-traffic platform.
Full-time|Hybrid|Berlin, Berlin, Germany; Remote (Europe); Stuttgart, Baden-Württemberg, Germany
Flip develops an AI-powered employee experience platform designed for frontline workers. The company’s mission is to make internal information easily accessible for every employee, wherever they work. Flip is expanding quickly and aims to change how millions of frontline employees stay connected with their organizations. Role overview The Site Reliability Engineer (m/w/d) joins the Platform Squad to keep Flip’s infrastructure fast, resilient, and ready for growth. This role focuses on shaping reliability practices, building internal tools, and fostering a culture where engineering teams can deploy confidently at scale while maintaining high uptime. The position is well-suited for those who enjoy designing high-throughput, highly available systems and want to influence the production operations of a growing SaaS platform. Key responsibilities Enable scaling: Expand and optimize Azure cloud infrastructure and Kubernetes clusters to support Flip’s global growth, prioritizing high throughput and availability. Ensure resilience & security: Design and implement zero-downtime deployments, effective rollback mechanisms, and disaster recovery strategies to keep the platform available at all times. Create observability: Improve the LGTM stack (Loki, Grafana, Tempo, Mimir) so teams have clear insight into system health and performance. Location This position can be based in Berlin or Stuttgart, Germany, or performed remotely from anywhere in Europe.
Why Join Scout24?Scout24 is the proud home of ImmoScout24, Germany's premier platform for real estate. For over 25 years, we have been at the forefront of transforming the real estate market in Germany and Austria. Our mission is to create a digital ecosystem that unites homeowners, seekers, and agents, making the journey to find the perfect home a seamless experience. Your career is as vital as finding the right property; hence, #WorkingatScout24 means you will be part of a vibrant, diverse team of around 1,100 colleagues from 58 nationalities. We celebrate individuality and foster a culture of open-mindedness and authenticity, enabling true learning and personal growth. Mistakes are viewed as opportunities for growth and innovation. Together, we proactively strive for improvement and take responsibility, discussing both successes and challenges with mutual respect because we are #oneteam.If this resonates with you, we would love to welcome you on board! Even if you don't meet every requirement, we encourage you to share how you can contribute to our team. Grow with us! Welcome home!Beyond our outstanding company culture, we offer exceptional benefits that make Scout24 a fantastic workplace!
N26 is looking for a Site Reliability Engineer to join the Platform Engineering team in Berlin. This role centers on maintaining and improving the reliability, performance, and scalability of core systems. Role overview Work closely with cross-functional teams to support and enhance the platform. The focus is on building solutions that keep systems stable and responsive as the company grows. What you will do Monitor and improve system reliability and uptime Collaborate with other teams to address performance and scalability challenges Contribute to solutions that strengthen the platform’s technical foundation Location This position is based in Berlin.
Join 1global as a Senior Site Reliability Engineer (SRE) and be part of a dynamic team dedicated to enhancing the reliability and performance of our systems. In this role, you will leverage your expertise in cloud infrastructure, automation, and monitoring to ensure our services run smoothly and efficiently.
Role Overview scalablegmbh is looking for a Senior Cloud Site Reliability Engineer with a focus on network systems. This position is based in Berlin. What You Will Do Maintain and improve the reliability, performance, and scalability of cloud infrastructure. Work closely with engineering teams to optimize network services and resolve technical challenges. Contribute to developing solutions that strengthen network systems. Support a culture of ongoing improvement across the organization. About You Bring expertise in cloud technologies and network systems. Enjoy solving complex problems and collaborating with others. Ready to make an impact in a growing company.
Site Reliability Engineer Company Overview At Orcrist Technologies, we are pioneering a next-generation data intelligence platform designed to manage petabyte-scale data with lightning-fast query responses. Our innovative solution is based on Kubernetes and is offered as both a B2B SaaS and an on-premise self-hosted option, including air-gapped deployments. We empower clients in defense, law enforcement, and enterprise sectors to translate mission-critical data into actionable insights. Your Role As a Site Reliability Engineer, you will be integral in deploying and managing our data intelligence platform within agency-controlled environments. You will construct and operate secure, highly available Kubernetes clusters, both on-premises and in hybrid architectures. In this role, you will also respond as a forward-deployed SRE during incidents and upgrades, ensuring our systems adhere to strict privacy, audit, and legal evidence standards tailored for law enforcement applications. Key Responsibilities Deploy, install, and manage Kubernetes clusters for our platform in on-prem and hybrid settings. Configure and maintain GitOps workflows, Helm/Kustomize, and artifact registries within restricted networks. Design and lead incident response initiatives for the observability stack (Prometheus, Grafana) and enforce disaster recovery protocols. Enhance system security through network segmentation, mTLS, IAM, and vulnerability remediation. Create compliance documentation, operational runbooks, and train both agency and Orcrist teams on best practices. About You 5+ years of experience in SRE/DevOps, with a focus on on-call ownership and managing production systems. Extensive hands-on experience with Kubernetes (on-prem/hybrid), GitOps (Argo CD/Flux), and infrastructure automation tools (Ansible, Terraform). Strong expertise in observability tools (Prometheus, Grafana, Loki) and complex incident response methodologies. Fluency in both German and English (C1+), authorized to work in Germany, with a willingness to travel (20–30%). Preferred Qualifications In-depth understanding of IT and governance frameworks within law enforcement or the public sector. Relevant certifications such as CKA/CKAD, ISO 27001 Lead Implementer, CISSP, or GDPR Practitioner. Demonstrated experience integrating with essential enterprise systems, including Identity and Access Management (SAML, LDAP), and Security Information and Event Management (SIEM) platforms. Familiarity with digital evidence workflows and contributions to judicial processes. Previous exposure to managing sensitive environments, including air-gapped systems and investigative tools for public safety.
Join redcare-pharmacy as a Senior Site Reliability Engineer in Berlin. We are seeking a talented and experienced individual who can enhance our infrastructure and ensure the reliability and performance of our systems. This role will involve collaboration with development teams to build scalable systems and improve our operational practices.
Who We AreHelsing is a pioneering defense AI company dedicated to safeguarding democracies. Our mission is to attain technological leadership, enabling open societies to make sovereign decisions and uphold their ethical standards. As a company, we recognize the profound responsibility that comes with developing and deploying powerful technologies like AI, and we are committed to addressing this responsibility with integrity.Our team consists of driven engineers, AI specialists, and customer-facing program managers who are passionate about solving the most complex and impactful challenges. We embrace a culture of openness and transparency, encouraging healthy debates about the role of technology in defense, its benefits, and its ethical implications.The RoleWe operate primarily in high-security, on-premise environments, and we are seeking a Site Reliability Engineer to support these critical infrastructures. In this role, you will be responsible for the design, implementation, and management of our on-premise Kubernetes infrastructure.We value engineers who exhibit a strong work ethic, prioritize effectively, and excel in teamwork. Clear communication, knowledge sharing, and collaboration are essential to advancing both our team and our mission.The Day-to-DayAs a Site Reliability Engineer, you will design and build cloud-native infrastructure platforms on-premises, focusing on Kubernetes-based solutions that empower our development teams to operate services at scale.You will create robust observability frameworks using tools like Grafana, Prometheus, and distributed tracing to ensure system reliability and performance.You will architect and implement secure, multi-tenant Kubernetes clusters to support our high-security environments.
About PlayStation and Sony Interactive Entertainment PlayStation, part of Sony Interactive Entertainment and a subsidiary of Sony Group Corporation, is known worldwide for delivering leading entertainment experiences. Our portfolio includes PlayStation®5, PlayStation®4, PlayStation®VR, PlayStation®Plus, and acclaimed titles from PlayStation Studios. We value diversity and inclusion, working to create an environment where employees feel empowered and supported. Our teams bring together people who are curious about technology and eager to shape the future of gaming. Role Overview: Site Reliability Engineer Based in Berlin, this Site Reliability Engineer role sits within the Gaming Developer & Future Technology Group (GDFT). The group drives cloud gaming innovation, delivering console-quality experiences to players across TVs, mobile devices, and more. The SRE team plays a central part in maintaining and improving the stability of our cloud gaming services. This position involves shaping both design and operational strategies, owning production systems, ensuring code quality, and managing deployments. SREs here contribute to decisions at multiple levels and work closely with teams throughout the software development lifecycle to support operational readiness and service stability. Main Responsibilities Lead and participate in technical discussions to improve reliability and scalability within the team. Contribute to High-Level Design (HLD) documents for new products and platforms. Mentor junior SREs, providing guidance and support for their growth. Take charge of incident response and post-mortem analysis within the assigned service area. Work with cross-functional groups to drive operational efficiency.
Are you ready to reshape the marketing landscape? At Almedia, we provide an extraordinary opportunity for those looking to accelerate their careers in a groundbreaking environment. As we set our sights on becoming Germany's next bootstrapped unicorn, we have already achieved recognition as Europe’s #3 fastest-growing company in 2025 (FT1000).Join us in our mission to redefine marketing by rewarding a vibrant community of over 60 million users for their engagement with leading advertisers. We are pioneering a novel approach to user acquisition for some of the world’s largest brands.We are seeking a passionate Founding Engineer to partner with our leadership team on transformative, zero-to-one projects. If you thrive on taking ownership, moving swiftly, and building impactful products from scratch, we want to connect with you.
Join Clera as a Founding Engineer in our innovative SF Hackerhouse! We are looking for passionate engineers who thrive in a startup environment and are eager to be a part of a groundbreaking journey. As a founding member, you will have the opportunity to shape the direction of our technology and contribute to building something truly impactful.
About Air Apps Air Apps began as a family-founded company in Lisbon, Portugal in 2018. The team focuses on building AI-powered tools for personal and entrepreneurial planning, including the Personal & Entrepreneurial Resource Planner (PRP). Over 100 million downloads worldwide mark a significant milestone for the self-funded company, which now has offices in Lisbon and San Francisco. Air Apps pursues long-term goals, working to challenge standard approaches and develop AI-driven solutions that make a real difference. The company values innovation and aims to empower people globally through its products. Site Reliability Engineer Role The Site Reliability Engineer (SRE) will help maintain and improve the reliability, availability, and scalability of Air Apps’ systems. This role bridges software development and operations, focusing on automation, monitoring, and performance tuning to reduce downtime and strengthen system resilience. Work Location This position is fully onsite at the Lisbon office. Collaboration with cross-functional teams is central to the role. Relocation support is available for the right candidate.
Join Upvest, where we aim to revolutionize investment accessibility, making it as seamless as everyday spending. Our innovative Investment API allows businesses to offer a diverse array of investment products while enhancing capital market investment and retirement planning experiences.As one of Europe's leading fintechs, Upvest provides a comprehensive suite of investment opportunities for our B2B clients, spanning principal broking, proprietary trading, and secure custody for traditional securities. Founded in 2017 by Martin Kassing, we have expanded to over 240 employees across Europe, supported by a recent €100 million Series C funding round led by Hedosophia and Sapphire Ventures, along with esteemed existing investors such as Bessemer Venture Partners and BlackRock.With our headquarters in Berlin and additional hubs in Tallinn and London, we embrace a hybrid work model, allowing flexibility with regular travel to Berlin.The OpportunityAt Upvest, reliability is not just a metric; it's the cornerstone of our growth. As we rapidly scale, we are committed to establishing a dedicated Site Reliability Engineering (SRE) function aimed at continuously enhancing our reliability standards. This is your opportunity to redefine what exceptional reliability entails for a high-growth fintech leader.You will have the autonomy to create a reliability culture, establish standards, and implement practices that will guide us through our next phase of expansion. If you've ever envisioned building an SRE practice from the ground up, now is your moment.The RoleYour mission as the SRE Lead will focus on prevention rather than reaction. You will be a blend of technical visionary and organizational innovator, integrating reliability into our development processes. Collaborating closely with engineering teams, you will enhance observability and resilience while creating frameworks that enable us to operate swiftly without sacrificing stability. Rather than owning services, your role will be to elevate those who do.Your influence will extend to shaping engineering leaders' perspectives on reliability, guiding product managers in balancing features with stability, and defining what it means to be 'production-ready' across the organization. You will lead and mentor a talented team of 2 to 4 SREs, fostering a culture of excellence that amplifies our impact.
GetYourGuide connects travelers with memorable experiences in over 12,000 cities. Since 2009, the company has helped millions discover new destinations. The Berlin headquarters leads a global team, with offices in cities such as New York and Bangkok. More than 850 employees collaborate to reshape how people find and book travel adventures. The Staff Site Reliability Engineer joins the Operational Excellence team, which works to minimize disruptions, boost productivity, and build user trust. As GetYourGuide expands its AI-powered travel solutions, this role ensures engineering speed and reliability remain strong so customers enjoy seamless experiences. What you will do Collaborate with product teams to improve system reliability, performance, and trust across the platform. Incident management and reliability Reduce the number of incidents, as well as Mean Time to Detect (MTTD) and Mean Time to Recovery (MTTR). Lead post-incident reviews and turn findings into lasting improvements. Create tools and runbooks that speed up diagnosis and resolution of production issues. Foster a culture that treats incidents as learning opportunities, not blame assignments. Take part in the infrastructure on-call rotation. Observability and production confidence Advance the Datadog-based observability stack, including metrics, logs, traces, dashboards, and alerts. Help teams define meaningful Service Level Objectives (SLOs) and prevent alert fatigue. Strengthen production debugging tools so engineers can solve issues independently. Change confidence and release quality Lower change failure rates by guiding teams on effective testing and deployment practices. Learn more about GetYourGuide’s team and mission at getyourguide.careers.
Full-time|On-site|Berlin, Berlin, Germany; Paris, Paris, France
At Doctolib, we pride ourselves on fostering a dynamic engineering environment where innovation thrives. Our mission is to enhance the lives of healthcare professionals and patients alike. We are seeking a Senior Site Reliability Engineer to ensure our production systems operate seamlessly, playing a crucial role in supporting the rapid expansion of Doctolib's services. Your Responsibilities As a Senior Site Reliability Engineer within the Core Reliability & Observability team, you will be instrumental in defining the company's observability strategy and maintaining the reliability, debuggability, and scalability of our platform. This position bridges infrastructure, developer experience, and product engineering, focusing on developing and enhancing the core elements of logging, metrics, tracing, and alerting across our organization. Lead the implementation of an observability strategy across the platform, emphasizing scalable, developer-friendly logging and tracing solutions. Identify and spearhead cross-functional reliability initiatives to enhance incident detection, response, and postmortem analysis capabilities. Participate in the on-call rotation and actively work on improving our on-call experience by optimizing alerting, minimizing noise, and providing actionable telemetry. Who You Are You could be our next teammate if you possess: A minimum of 3 years of hands-on experience with large-scale production platforms. Demonstrated proficiency with cloud platforms such as AWS, Azure, or Google Cloud. A strong understanding of containerization and orchestration technologies (Docker and Kubernetes). A deep knowledge of Helm for managing Kubernetes manifests and ArgoCD for GitOps workflows. Extensive expertise in observability tooling and architecture, including: Logging: Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Logstash, Vector. Tracing: OpenTelemetry or proprietary APMs. Metrics: Prometheus, Thanos, Datadog, or equivalent. Proficiency in at least one programming language (e.g., Ruby, Python, Go, Java) and a strong grasp of infrastructure as code principles. Experience with monitoring and observability tools.
Why Join NebiusNebius is at the forefront of a transformative era in cloud computing, tailored to meet the demands of the global AI economy. We equip our clients with the innovative tools and resources necessary to tackle real-world challenges and revolutionize their industries, all while avoiding significant infrastructure investments and the need for expansive in-house AI/ML teams. Our team works with the latest advancements in AI cloud infrastructure, collaborating with some of the most knowledgeable and creative leaders and engineers in the industry.Our Work EnvironmentWith our headquarters in Amsterdam and a presence on Nasdaq, Nebius has established a global footprint with R&D hubs across Europe, North America, and Israel. Our diverse workforce of over 1400 includes more than 400 highly skilled engineers specializing in both hardware and software engineering, in addition to a dedicated in-house AI R&D team.The RoleYour Responsibilities Will Include:Ensuring fault tolerance, scalability, and seamless operation of our services.Utilizing cutting-edge cloud technology to address various infrastructure challenges.Implementing and enhancing CI/CD processes.Qualifications We Expect:Proven experience with programming languages such as Go, Python, or C++.Strong understanding of classic algorithms and data structures.Commercial experience and in-depth knowledge of Unix systems and network technologies.Experience with containerization and configuration management systems (e.g., Ansible, Salt, Terraform, Docker, Kubernetes, Helm).Bonus Qualifications:A passion for backend development.Experience in designing, developing, and operating high-load distributed systems.Commercial experience with various cloud platforms.We conduct coding interviews as part of our hiring process.
As a Principal Product Manager in Site Reliability Engineering at Delivery Hero, you will take the lead in enhancing our site reliability practices to ensure optimal performance and availability of our platforms. You will collaborate with cross-functional teams to define product strategies, drive initiatives, and implement solutions that enhance user experience and operational efficiency. Your expertise will guide our engineering teams in adopting best practices and innovative technologies to maintain our position as a leader in the online food delivery market.
About IvyAt Ivy, we are pioneering the world's first programmable bank, positioning ourselves at the intersection of traditional finance and the emerging world of cryptocurrencies. Our vision is to establish a fully regulated, stablecoin-native banking solution that addresses the gaps between conventional banking and the crypto space. We see an immense opportunity, valued in trillions, to create regulated banks leveraging blockchain technology.Currently, we serve some of the leading platforms in the crypto industry, including Kraken. Our goal is to expand our platform to facilitate global correspondent banking and support the next generation of AI agents. If successful, Ivy has the potential to become Europe's first trillion-dollar company.Our team consists of former founders, early employees from unicorn startups, and experts from renowned fintech and banking backgrounds, all supported by premier fintech investors.Your MissionAs a DevOps Engineer, you will focus on enhancing our development productivity and security. You will be responsible for managing the release pipeline, AWS infrastructure, and developer tooling, while also contributing to crucial technical decisions within a close-knit team.Your ProfileYou have a background in engineering with a strong emphasis on infrastructure.You're adept at diving deep into complex issues and developing effective solutions.You strive for optimal performance and proactively address inefficiencies.You thrive in small teams, delivering results quickly and efficiently.You are motivated to achieve outstanding outcomes in the shortest time possible.How We WorkWe are committed to producing high-quality software.We take full ownership of our work, from identifying customer problems to ensuring smooth production deployment.