Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
Proven experience in database management and reliability engineering. Strong expertise in SQL and NoSQL databases. Familiarity with cloud environments and database services. Problem-solving skills and a proactive approach to improving system reliability. Excellent communication and teamwork abilities.
About the job
Join our dynamic team at dev2 as a Senior/Lead Software Engineer specializing in Database Reliability. We are seeking an experienced professional who will ensure the performance, reliability, and availability of our database infrastructure. You will play a crucial role in designing and implementing solutions that enhance our database systems while collaborating closely with cross-functional teams.
About dev2
dev2 is a leading technology firm based in Berlin, focused on delivering innovative software solutions. Our team is passionate about technology and dedicated to pushing the boundaries of what is possible. We foster a collaborative environment where creativity and initiative are encouraged.
Similar jobs
1 - 20 of 1,402 Jobs
Search for Senior Software Engineer Site Reliability Engineering
Why Join Scout24?Scout24 is the proud home of ImmoScout24, Germany's premier platform for real estate. For over 25 years, we have been at the forefront of transforming the real estate market in Germany and Austria. Our mission is to create a digital ecosystem that unites homeowners, seekers, and agents, making the journey to find the perfect home a seamless experience. Your career is as vital as finding the right property; hence, #WorkingatScout24 means you will be part of a vibrant, diverse team of around 1,100 colleagues from 58 nationalities. We celebrate individuality and foster a culture of open-mindedness and authenticity, enabling true learning and personal growth. Mistakes are viewed as opportunities for growth and innovation. Together, we proactively strive for improvement and take responsibility, discussing both successes and challenges with mutual respect because we are #oneteam.If this resonates with you, we would love to welcome you on board! Even if you don't meet every requirement, we encourage you to share how you can contribute to our team. Grow with us! Welcome home!Beyond our outstanding company culture, we offer exceptional benefits that make Scout24 a fantastic workplace!
N26 is looking for a Site Reliability Engineer to join the Platform Engineering team in Berlin. This role centers on maintaining and improving the reliability, performance, and scalability of core systems. Role overview Work closely with cross-functional teams to support and enhance the platform. The focus is on building solutions that keep systems stable and responsive as the company grows. What you will do Monitor and improve system reliability and uptime Collaborate with other teams to address performance and scalability challenges Contribute to solutions that strengthen the platform’s technical foundation Location This position is based in Berlin.
Site Reliability Engineer Company Overview At Orcrist Technologies, we are pioneering a next-generation data intelligence platform designed to manage petabyte-scale data with lightning-fast query responses. Our innovative solution is based on Kubernetes and is offered as both a B2B SaaS and an on-premise self-hosted option, including air-gapped deployments. We empower clients in defense, law enforcement, and enterprise sectors to translate mission-critical data into actionable insights. Your Role As a Site Reliability Engineer, you will be integral in deploying and managing our data intelligence platform within agency-controlled environments. You will construct and operate secure, highly available Kubernetes clusters, both on-premises and in hybrid architectures. In this role, you will also respond as a forward-deployed SRE during incidents and upgrades, ensuring our systems adhere to strict privacy, audit, and legal evidence standards tailored for law enforcement applications. Key Responsibilities Deploy, install, and manage Kubernetes clusters for our platform in on-prem and hybrid settings. Configure and maintain GitOps workflows, Helm/Kustomize, and artifact registries within restricted networks. Design and lead incident response initiatives for the observability stack (Prometheus, Grafana) and enforce disaster recovery protocols. Enhance system security through network segmentation, mTLS, IAM, and vulnerability remediation. Create compliance documentation, operational runbooks, and train both agency and Orcrist teams on best practices. About You 5+ years of experience in SRE/DevOps, with a focus on on-call ownership and managing production systems. Extensive hands-on experience with Kubernetes (on-prem/hybrid), GitOps (Argo CD/Flux), and infrastructure automation tools (Ansible, Terraform). Strong expertise in observability tools (Prometheus, Grafana, Loki) and complex incident response methodologies. Fluency in both German and English (C1+), authorized to work in Germany, with a willingness to travel (20–30%). Preferred Qualifications In-depth understanding of IT and governance frameworks within law enforcement or the public sector. Relevant certifications such as CKA/CKAD, ISO 27001 Lead Implementer, CISSP, or GDPR Practitioner. Demonstrated experience integrating with essential enterprise systems, including Identity and Access Management (SAML, LDAP), and Security Information and Event Management (SIEM) platforms. Familiarity with digital evidence workflows and contributions to judicial processes. Previous exposure to managing sensitive environments, including air-gapped systems and investigative tools for public safety.
Who We AreHelsing is a pioneering defense AI company dedicated to safeguarding democracies. Our mission is to attain technological leadership, enabling open societies to make sovereign decisions and uphold their ethical standards. As a company, we recognize the profound responsibility that comes with developing and deploying powerful technologies like AI, and we are committed to addressing this responsibility with integrity.Our team consists of driven engineers, AI specialists, and customer-facing program managers who are passionate about solving the most complex and impactful challenges. We embrace a culture of openness and transparency, encouraging healthy debates about the role of technology in defense, its benefits, and its ethical implications.The RoleWe operate primarily in high-security, on-premise environments, and we are seeking a Site Reliability Engineer to support these critical infrastructures. In this role, you will be responsible for the design, implementation, and management of our on-premise Kubernetes infrastructure.We value engineers who exhibit a strong work ethic, prioritize effectively, and excel in teamwork. Clear communication, knowledge sharing, and collaboration are essential to advancing both our team and our mission.The Day-to-DayAs a Site Reliability Engineer, you will design and build cloud-native infrastructure platforms on-premises, focusing on Kubernetes-based solutions that empower our development teams to operate services at scale.You will create robust observability frameworks using tools like Grafana, Prometheus, and distributed tracing to ensure system reliability and performance.You will architect and implement secure, multi-tenant Kubernetes clusters to support our high-security environments.
GetYourGuide connects travelers with memorable experiences in over 12,000 cities. Since 2009, the company has helped millions discover new destinations. The Berlin headquarters leads a global team, with offices in cities such as New York and Bangkok. More than 850 employees collaborate to reshape how people find and book travel adventures. The Staff Site Reliability Engineer joins the Operational Excellence team, which works to minimize disruptions, boost productivity, and build user trust. As GetYourGuide expands its AI-powered travel solutions, this role ensures engineering speed and reliability remain strong so customers enjoy seamless experiences. What you will do Collaborate with product teams to improve system reliability, performance, and trust across the platform. Incident management and reliability Reduce the number of incidents, as well as Mean Time to Detect (MTTD) and Mean Time to Recovery (MTTR). Lead post-incident reviews and turn findings into lasting improvements. Create tools and runbooks that speed up diagnosis and resolution of production issues. Foster a culture that treats incidents as learning opportunities, not blame assignments. Take part in the infrastructure on-call rotation. Observability and production confidence Advance the Datadog-based observability stack, including metrics, logs, traces, dashboards, and alerts. Help teams define meaningful Service Level Objectives (SLOs) and prevent alert fatigue. Strengthen production debugging tools so engineers can solve issues independently. Change confidence and release quality Lower change failure rates by guiding teams on effective testing and deployment practices. Learn more about GetYourGuide’s team and mission at getyourguide.careers.
Role Overview scalablegmbh is looking for a Senior Cloud Site Reliability Engineer with a focus on network systems. This position is based in Berlin. What You Will Do Maintain and improve the reliability, performance, and scalability of cloud infrastructure. Work closely with engineering teams to optimize network services and resolve technical challenges. Contribute to developing solutions that strengthen network systems. Support a culture of ongoing improvement across the organization. About You Bring expertise in cloud technologies and network systems. Enjoy solving complex problems and collaborating with others. Ready to make an impact in a growing company.
Veeva Systems is an innovative leader in the industry cloud space, dedicated to accelerating the delivery of therapies to patients. As one of the fastest-growing SaaS companies in history, we achieved over $2 billion in revenue last fiscal year and are poised for significant continued growth.Our core values—Do the Right Thing, Customer Success, Employee Success, and Speed—define our unique culture. In 2021, we made history by becoming a public benefit corporation (PBC), which legally binds us to consider the interests of our customers, employees, society, and investors.As a Work Anywhere company, we offer the flexibility to work from home or in the office, allowing you to thrive in your preferred environment.Join us in transforming the life sciences industry by making a meaningful impact for our customers, employees, and communities.The RoleWe invite you to be a part of our talented Vault Platform team as a Senior Site Reliability Engineer, where your expertise will ensure the scalability and reliability of our enterprise applications. You will address complex global challenges, leveraging your deep knowledge of Java and modern open-source technologies to enhance our production systems.Ideal candidates will possess extensive experience with Java applications and contemporary open-source technologies, preferably gained from enterprise software development or high-growth tech environments. A natural curiosity and problem-solving ability are essential, along with a sophisticated engineering perspective to understand system integrations that support hundreds of customers across North America, Europe, and Asia.
Join our dynamic team at dev2 as a Senior/Lead Software Engineer specializing in Database Reliability. We are seeking an experienced professional who will ensure the performance, reliability, and availability of our database infrastructure. You will play a crucial role in designing and implementing solutions that enhance our database systems while collaborating closely with cross-functional teams.
Join redcare-pharmacy as a Senior Site Reliability Engineer in Berlin. We are seeking a talented and experienced individual who can enhance our infrastructure and ensure the reliability and performance of our systems. This role will involve collaboration with development teams to build scalable systems and improve our operational practices.
Full-time|Hybrid|Berlin, Berlin, Germany; Remote (Europe); Stuttgart, Baden-Württemberg, Germany
Flip develops an AI-powered employee experience platform designed for frontline workers. The company’s mission is to make internal information easily accessible for every employee, wherever they work. Flip is expanding quickly and aims to change how millions of frontline employees stay connected with their organizations. Role overview The Site Reliability Engineer (m/w/d) joins the Platform Squad to keep Flip’s infrastructure fast, resilient, and ready for growth. This role focuses on shaping reliability practices, building internal tools, and fostering a culture where engineering teams can deploy confidently at scale while maintaining high uptime. The position is well-suited for those who enjoy designing high-throughput, highly available systems and want to influence the production operations of a growing SaaS platform. Key responsibilities Enable scaling: Expand and optimize Azure cloud infrastructure and Kubernetes clusters to support Flip’s global growth, prioritizing high throughput and availability. Ensure resilience & security: Design and implement zero-downtime deployments, effective rollback mechanisms, and disaster recovery strategies to keep the platform available at all times. Create observability: Improve the LGTM stack (Loki, Grafana, Tempo, Mimir) so teams have clear insight into system health and performance. Location This position can be based in Berlin or Stuttgart, Germany, or performed remotely from anywhere in Europe.
Join TechBiz Global as we empower our prestigious clients by providing exceptional recruitment services. We are currently on the lookout for a Founding DevOps Engineer (SRE) to become an integral part of our client's team. If you are eager to advance your career in a cutting-edge environment, this opportunity could be perfect for you.Berlin • Cybersecurity & AI Startup • Recently FundedOur client, an innovative cybersecurity startup based in Berlin, is seeking a DevOps Engineer to join as a founding member and contribute to the development of the core security, identity, and enforcement frameworks of a pioneering AI-driven risk management platform.Founded by seasoned cybersecurity professionals with experience in Israeli intelligence, our client is looking for a proactive Founding DevOps Engineer for a hybrid role located in central Berlin. If you have a passion for cybersecurity and AI, excel in dynamic startup settings, and relish the challenge of building sophisticated platforms from the ground up, this is a chance to make a significant impact.This startup is creating a state-of-the-art cyber risk platform designed to help enterprises effectively comprehend, measure, and mitigate identity risks on a large scale. Their mission is to transform intricate identity and security data into clear, actionable insights that Chief Information Security Officers (CISOs) and Chief Technology Officers (CTOs) can rely on. From day one, you will be instrumental in shaping core platform components, influencing how modern enterprises manage risk using cloud-native technologies, AI-driven analytics, and automated enforcement through AI agents.Key ResponsibilitiesDesign, build, and operate the foundational cloud infrastructure for a secure, scalable, production-ready SaaS platform from the outset.Manage AWS environments comprehensively, encompassing networking, IAM, compute, storage, and security parameters.Develop and sustain Infrastructure as Code practices to ensure efficient deployment and management.
Full-time|On-site|Berlin, Berlin, Germany; Paris, Paris, France
At Doctolib, we pride ourselves on fostering a dynamic engineering environment where innovation thrives. Our mission is to enhance the lives of healthcare professionals and patients alike. We are seeking a Senior Site Reliability Engineer to ensure our production systems operate seamlessly, playing a crucial role in supporting the rapid expansion of Doctolib's services. Your Responsibilities As a Senior Site Reliability Engineer within the Core Reliability & Observability team, you will be instrumental in defining the company's observability strategy and maintaining the reliability, debuggability, and scalability of our platform. This position bridges infrastructure, developer experience, and product engineering, focusing on developing and enhancing the core elements of logging, metrics, tracing, and alerting across our organization. Lead the implementation of an observability strategy across the platform, emphasizing scalable, developer-friendly logging and tracing solutions. Identify and spearhead cross-functional reliability initiatives to enhance incident detection, response, and postmortem analysis capabilities. Participate in the on-call rotation and actively work on improving our on-call experience by optimizing alerting, minimizing noise, and providing actionable telemetry. Who You Are You could be our next teammate if you possess: A minimum of 3 years of hands-on experience with large-scale production platforms. Demonstrated proficiency with cloud platforms such as AWS, Azure, or Google Cloud. A strong understanding of containerization and orchestration technologies (Docker and Kubernetes). A deep knowledge of Helm for managing Kubernetes manifests and ArgoCD for GitOps workflows. Extensive expertise in observability tooling and architecture, including: Logging: Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Logstash, Vector. Tracing: OpenTelemetry or proprietary APMs. Metrics: Prometheus, Thanos, Datadog, or equivalent. Proficiency in at least one programming language (e.g., Ruby, Python, Go, Java) and a strong grasp of infrastructure as code principles. Experience with monitoring and observability tools.
Join Almedia, a pioneering company on a mission to revolutionize marketing by rewarding a community of over 60 million users for their engagement with global brands. Here, you can accelerate your career in an exciting environment aiming to become Germany's next bootstrapped unicorn, recognized as Europe's #3 fastest-growing company in 2025 (FT1000).We are seeking a passionate and skilled Site Reliability Engineer / DevOps to help us maintain the performance and reliability of our high-traffic platform.
Superhuman embraces a dynamic hybrid working model for this position, offering team members the ideal balance of focused work and in-person collaboration that nurtures trust, innovation, and a vibrant team culture.About SuperhumanSuperhuman is at the forefront of AI productivity, empowering individuals to reach their superhuman potential. As the proud home of Grammarly, our suite of applications integrates seamlessly with over 1 million platforms, enhancing productivity through intelligent features. Our offerings include Grammarly's writing assistance, Coda's collaborative spaces, and Go, an AI assistant that proactively provides contextual support. Since our inception in 2009, we have transformed the workflows of more than 40 million users, 50,000 organizations, and 3,000 educational institutions globally. Discover more at superhuman.com.The OpportunityIn pursuit of our ambitious goals, we seek a Site Reliability Engineer (SRE) to strengthen our infrastructure team. This pivotal role involves developing software to enhance the reliability of our backend systems, collaborating closely with engineers, and strategizing for future scalability. You will engage with our existing production engineering teams in the EU as we transition away from the “you build it, you own it” approach.The engineers and researchers at Superhuman are given the freedom to innovate and drive breakthroughs, subsequently influencing our product roadmap. As we expand our interfaces, algorithms, and infrastructure, the complexity of our technical challenges continues to grow. Learn more about our technical endeavors on our technical blog.As an SRE, your responsibilities will include:Scaling our Kubernetes-based control plane that processes billions of events daily.Enhancing our automation systems that respond to workload demands.Deploying machine learning systems company-wide.
Join Upvest, where we aim to revolutionize investment accessibility, making it as seamless as everyday spending. Our innovative Investment API allows businesses to offer a diverse array of investment products while enhancing capital market investment and retirement planning experiences.As one of Europe's leading fintechs, Upvest provides a comprehensive suite of investment opportunities for our B2B clients, spanning principal broking, proprietary trading, and secure custody for traditional securities. Founded in 2017 by Martin Kassing, we have expanded to over 240 employees across Europe, supported by a recent €100 million Series C funding round led by Hedosophia and Sapphire Ventures, along with esteemed existing investors such as Bessemer Venture Partners and BlackRock.With our headquarters in Berlin and additional hubs in Tallinn and London, we embrace a hybrid work model, allowing flexibility with regular travel to Berlin.The OpportunityAt Upvest, reliability is not just a metric; it's the cornerstone of our growth. As we rapidly scale, we are committed to establishing a dedicated Site Reliability Engineering (SRE) function aimed at continuously enhancing our reliability standards. This is your opportunity to redefine what exceptional reliability entails for a high-growth fintech leader.You will have the autonomy to create a reliability culture, establish standards, and implement practices that will guide us through our next phase of expansion. If you've ever envisioned building an SRE practice from the ground up, now is your moment.The RoleYour mission as the SRE Lead will focus on prevention rather than reaction. You will be a blend of technical visionary and organizational innovator, integrating reliability into our development processes. Collaborating closely with engineering teams, you will enhance observability and resilience while creating frameworks that enable us to operate swiftly without sacrificing stability. Rather than owning services, your role will be to elevate those who do.Your influence will extend to shaping engineering leaders' perspectives on reliability, guiding product managers in balancing features with stability, and defining what it means to be 'production-ready' across the organization. You will lead and mentor a talented team of 2 to 4 SREs, fostering a culture of excellence that amplifies our impact.
As a Principal Product Manager in Site Reliability Engineering at Delivery Hero, you will take the lead in enhancing our site reliability practices to ensure optimal performance and availability of our platforms. You will collaborate with cross-functional teams to define product strategies, drive initiatives, and implement solutions that enhance user experience and operational efficiency. Your expertise will guide our engineering teams in adopting best practices and innovative technologies to maintain our position as a leader in the online food delivery market.
About PlayStation and Sony Interactive Entertainment PlayStation, part of Sony Interactive Entertainment and a subsidiary of Sony Group Corporation, is known worldwide for delivering leading entertainment experiences. Our portfolio includes PlayStation®5, PlayStation®4, PlayStation®VR, PlayStation®Plus, and acclaimed titles from PlayStation Studios. We value diversity and inclusion, working to create an environment where employees feel empowered and supported. Our teams bring together people who are curious about technology and eager to shape the future of gaming. Role Overview: Site Reliability Engineer Based in Berlin, this Site Reliability Engineer role sits within the Gaming Developer & Future Technology Group (GDFT). The group drives cloud gaming innovation, delivering console-quality experiences to players across TVs, mobile devices, and more. The SRE team plays a central part in maintaining and improving the stability of our cloud gaming services. This position involves shaping both design and operational strategies, owning production systems, ensuring code quality, and managing deployments. SREs here contribute to decisions at multiple levels and work closely with teams throughout the software development lifecycle to support operational readiness and service stability. Main Responsibilities Lead and participate in technical discussions to improve reliability and scalability within the team. Contribute to High-Level Design (HLD) documents for new products and platforms. Mentor junior SREs, providing guidance and support for their growth. Take charge of incident response and post-mortem analysis within the assigned service area. Work with cross-functional groups to drive operational efficiency.
Full-time|Remote|Berlin, Berlin, Germany; Remote (Europe); Stuttgart, Baden-Württemberg, Germany
Flip is building an AI-powered employee experience platform designed for frontline workers. The mission centers on giving every employee, no matter their location, access to essential company information. Flip’s goal is to become the most widely used platform for frontline teams, changing how these teams connect and collaborate. Role overview As a Site Reliability Engineer in the Platform Squad, the focus is on keeping Flip’s infrastructure reliable, fast, and scalable. The role involves shaping reliability practices, developing internal tools, and supporting engineering teams as they deploy at scale. This position is well suited for someone who enjoys building high-throughput, highly available systems and wants to have a direct impact on the operations of a SaaS platform in production. What you will do Scale infrastructure: Improve and optimize Azure cloud environments and Kubernetes clusters to support global growth and high availability. Ensure resilience and safety: Build and maintain zero-downtime deployments, rollback strategies, and disaster recovery plans to keep the platform running around the clock. Advance observability: Enhance the LGTM stack (Loki, Grafana, Tempo, Mimir) to provide teams with visibility and use these tools to define and refine Service Level Objectives (SLOs). Automate infrastructure: Create and improve infrastructure as code using Pulumi in Go, reducing manual work and making the platform more self-service for engineers. Promote reliability practices: Support CI/CD best practices, incident response, post-mortems, and improvements to developer experience across engineering. Shape the platform’s future: Collaborate with your squad and engineering leadership to influence the roadmap, including scaling, cost management, security, and compliance. Location This role is open to candidates based in Berlin or Stuttgart, Germany, as well as remote applicants located within Europe.
About Air Apps Air Apps began as a family-founded company in Lisbon, Portugal in 2018. The team focuses on building AI-powered tools for personal and entrepreneurial planning, including the Personal & Entrepreneurial Resource Planner (PRP). Over 100 million downloads worldwide mark a significant milestone for the self-funded company, which now has offices in Lisbon and San Francisco. Air Apps pursues long-term goals, working to challenge standard approaches and develop AI-driven solutions that make a real difference. The company values innovation and aims to empower people globally through its products. Site Reliability Engineer Role The Site Reliability Engineer (SRE) will help maintain and improve the reliability, availability, and scalability of Air Apps’ systems. This role bridges software development and operations, focusing on automation, monitoring, and performance tuning to reduce downtime and strengthen system resilience. Work Location This position is fully onsite at the Lisbon office. Collaboration with cross-functional teams is central to the role. Relocation support is available for the right candidate.
About PlayStation and Sony Interactive Entertainment PlayStation, part of Sony Interactive Entertainment and a subsidiary of Sony Group Corporation, is known worldwide for its gaming products and services, including PlayStation®5, PlayStation®4, PlayStation®VR, PlayStation®Plus, and acclaimed titles from PlayStation Studios. The company values diversity and aims to create an inclusive environment where every team member can thrive. The Berlin office is home to a growing team focused on innovation and technology in entertainment. Role Overview: Senior Service Reliability Engineer The Senior Service Reliability Engineer joins the Gaming Developer & Future Technology Group (GDFT) in Berlin. This team leads efforts in cloud gaming, bringing console-quality experiences to a range of devices, including TVs, consoles, and mobile platforms. The Site Reliability Engineering (SRE) group ensures that cloud gaming services remain stable and reliable for players everywhere. SREs at Sony Interactive Entertainment contribute to design and operational decisions that support service stability and performance. What You Will Do Take full ownership of production environments for cloud gaming services. Maintain and improve the quality of production code. Oversee deployments, ensuring smooth releases and minimal disruption. Engage in decision-making across technical and operational areas. Work proactively and independently within a collaborative team. Who We’re Looking For Experience in service reliability, site reliability engineering, or related fields. Proactive approach and ability to work with minimal direction. Strong communication skills and willingness to contribute to team decisions. Interest in cloud gaming, technology, and delivering high-quality services. Location Berlin, Germany
Apr 20, 2026
Sign in to browse more jobs
Create account — see all 1,402 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.