Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Manager
Qualifications
Proven experience in a managerial role within engineering, with a focus on site reliability. Strong understanding of cloud technologies and infrastructure management. Excellent problem-solving skills and the ability to work under pressure. Experience with monitoring and performance tuning tools. Exceptional communication and leadership abilities.
About the job
Jobgether seeks an Engineering Manager specializing in Site Reliability Engineering to lead a team dedicated to the development and maintenance of essential systems. This role is based in Germany and centers on ensuring the reliability and performance of the company's core infrastructure.
What you will do
Guide and mentor a team of site reliability engineers.
Supervise the stability, scalability, and efficiency of production environments.
Champion reliability and performance best practices within the team.
Collaborate with other departments to align infrastructure with business objectives and sustain high system availability.
Requirements
Demonstrated experience managing engineering teams, particularly in site reliability or similar areas.
Solid background in designing and supporting reliable, scalable systems.
Strong ability to work with both technical and non-technical groups to advance business goals.
About jobgether
At jobgether, we are committed to creating a collaborative and innovative work environment. Our mission is to connect talented individuals with exceptional career opportunities. We value diversity and strive to foster an inclusive culture where every team member can thrive.
Similar jobs
1 - 20 of 9,944 Jobs
Search for Engineering Manager - Site Reliability Engineering
Role overview Jobgether seeks an Engineering Manager specializing in Site Reliability Engineering to lead a team dedicated to the development and maintenance of essential systems. This role is based in Germany and centers on ensuring the reliability and performance of the company's core infrastructure. What you will do Guide and mentor a team of site reliability engineers. Supervise the stability, scalability, and efficiency of production environments. Champion reliability and performance best practices within the team. Collaborate with other departments to align infrastructure with business objectives and sustain high system availability. Requirements Demonstrated experience managing engineering teams, particularly in site reliability or similar areas. Solid background in designing and supporting reliable, scalable systems. Strong ability to work with both technical and non-technical groups to advance business goals.
N26 is looking for a Site Reliability Engineer to join the Platform Engineering team in Berlin. This role centers on maintaining and improving the reliability, performance, and scalability of core systems. Role overview Work closely with cross-functional teams to support and enhance the platform. The focus is on building solutions that keep systems stable and responsive as the company grows. What you will do Monitor and improve system reliability and uptime Collaborate with other teams to address performance and scalability challenges Contribute to solutions that strengthen the platform’s technical foundation Location This position is based in Berlin.
Site Reliability Engineer Company Overview At Orcrist Technologies, we are pioneering a next-generation data intelligence platform designed to manage petabyte-scale data with lightning-fast query responses. Our innovative solution is based on Kubernetes and is offered as both a B2B SaaS and an on-premise self-hosted option, including air-gapped deployments. We empower clients in defense, law enforcement, and enterprise sectors to translate mission-critical data into actionable insights. Your Role As a Site Reliability Engineer, you will be integral in deploying and managing our data intelligence platform within agency-controlled environments. You will construct and operate secure, highly available Kubernetes clusters, both on-premises and in hybrid architectures. In this role, you will also respond as a forward-deployed SRE during incidents and upgrades, ensuring our systems adhere to strict privacy, audit, and legal evidence standards tailored for law enforcement applications. Key Responsibilities Deploy, install, and manage Kubernetes clusters for our platform in on-prem and hybrid settings. Configure and maintain GitOps workflows, Helm/Kustomize, and artifact registries within restricted networks. Design and lead incident response initiatives for the observability stack (Prometheus, Grafana) and enforce disaster recovery protocols. Enhance system security through network segmentation, mTLS, IAM, and vulnerability remediation. Create compliance documentation, operational runbooks, and train both agency and Orcrist teams on best practices. About You 5+ years of experience in SRE/DevOps, with a focus on on-call ownership and managing production systems. Extensive hands-on experience with Kubernetes (on-prem/hybrid), GitOps (Argo CD/Flux), and infrastructure automation tools (Ansible, Terraform). Strong expertise in observability tools (Prometheus, Grafana, Loki) and complex incident response methodologies. Fluency in both German and English (C1+), authorized to work in Germany, with a willingness to travel (20–30%). Preferred Qualifications In-depth understanding of IT and governance frameworks within law enforcement or the public sector. Relevant certifications such as CKA/CKAD, ISO 27001 Lead Implementer, CISSP, or GDPR Practitioner. Demonstrated experience integrating with essential enterprise systems, including Identity and Access Management (SAML, LDAP), and Security Information and Event Management (SIEM) platforms. Familiarity with digital evidence workflows and contributions to judicial processes. Previous exposure to managing sensitive environments, including air-gapped systems and investigative tools for public safety.
Who We AreHelsing is a pioneering defense AI company dedicated to safeguarding democracies. Our mission is to attain technological leadership, enabling open societies to make sovereign decisions and uphold their ethical standards. As a company, we recognize the profound responsibility that comes with developing and deploying powerful technologies like AI, and we are committed to addressing this responsibility with integrity.Our team consists of driven engineers, AI specialists, and customer-facing program managers who are passionate about solving the most complex and impactful challenges. We embrace a culture of openness and transparency, encouraging healthy debates about the role of technology in defense, its benefits, and its ethical implications.The RoleWe operate primarily in high-security, on-premise environments, and we are seeking a Site Reliability Engineer to support these critical infrastructures. In this role, you will be responsible for the design, implementation, and management of our on-premise Kubernetes infrastructure.We value engineers who exhibit a strong work ethic, prioritize effectively, and excel in teamwork. Clear communication, knowledge sharing, and collaboration are essential to advancing both our team and our mission.The Day-to-DayAs a Site Reliability Engineer, you will design and build cloud-native infrastructure platforms on-premises, focusing on Kubernetes-based solutions that empower our development teams to operate services at scale.You will create robust observability frameworks using tools like Grafana, Prometheus, and distributed tracing to ensure system reliability and performance.You will architect and implement secure, multi-tenant Kubernetes clusters to support our high-security environments.
GetYourGuide connects travelers with memorable experiences in over 12,000 cities. Since 2009, the company has helped millions discover new destinations. The Berlin headquarters leads a global team, with offices in cities such as New York and Bangkok. More than 850 employees collaborate to reshape how people find and book travel adventures. The Staff Site Reliability Engineer joins the Operational Excellence team, which works to minimize disruptions, boost productivity, and build user trust. As GetYourGuide expands its AI-powered travel solutions, this role ensures engineering speed and reliability remain strong so customers enjoy seamless experiences. What you will do Collaborate with product teams to improve system reliability, performance, and trust across the platform. Incident management and reliability Reduce the number of incidents, as well as Mean Time to Detect (MTTD) and Mean Time to Recovery (MTTR). Lead post-incident reviews and turn findings into lasting improvements. Create tools and runbooks that speed up diagnosis and resolution of production issues. Foster a culture that treats incidents as learning opportunities, not blame assignments. Take part in the infrastructure on-call rotation. Observability and production confidence Advance the Datadog-based observability stack, including metrics, logs, traces, dashboards, and alerts. Help teams define meaningful Service Level Objectives (SLOs) and prevent alert fatigue. Strengthen production debugging tools so engineers can solve issues independently. Change confidence and release quality Lower change failure rates by guiding teams on effective testing and deployment practices. Learn more about GetYourGuide’s team and mission at getyourguide.careers.
Site Reliability Engineer Company Overview At Orcrist Technologies, we are pioneers in developing the Orcrist Intelligence Platform (OIP), a robust and secure Kubernetes-native system designed for flexibility across cloud, on-prem, and air-gapped environments. Our dedicated Innovation team spearheads new initiatives, working independently from delivery teams to prototype and validate comprehensive solutions before they become fully-fledged products. Role Overview As a Site Reliability Engineer, you will play a crucial role in accelerating Innovation's rapid prototyping cycles. Your responsibilities will include automating prototype environments, overseeing demo deployments, and validating platform constraints early in the development process. You will swiftly provision sandboxes, ensure prototypes are deployable, and create infrastructure handoff packages for seamless productization. Key Responsibilities Automate the provisioning and teardown of prototype environments using Kubernetes, GitOps, and Pulumi/Terraform to facilitate rapid discovery cycles. Manage reliable demo deployments that showcase new initiatives while maintaining a quick iteration pace. Early validation of air-gapped, on-premise, and cloud constraints to ensure prototypes are ready for adoption by productization teams. Implement observability for prototypes using OpenTelemetry, Prometheus, and Grafana, along with lightweight Service Level Objectives (SLOs) to support validation efforts. Create comprehensive infrastructure handoff packages, including deployment specifications, Helm charts, and runbooks to ensure a smooth transition to productization. Collaborate with the Platform team to ensure prototypes align with standards while meeting the rapid iteration demands of the Innovation team. Qualifications 4+ years of experience in Site Reliability Engineering or Platform engineering with practical knowledge of Kubernetes and GitOps. Proven track record in rapid environment provisioning, ephemeral deployments, and infrastructure automation. Comfortable validating deployability constraints (air-gapped, on-prem) early in the development process. Exceptional technical writing skills, capable of producing clear deployment specifications and handoff artifacts that minimize adoption friction. Eligibility to work in Germany; EU or NATO citizenship is preferred. Preferred Qualifications Proficiency in the German language (B1+), experience with BSI/ISO compliance, and familiarity with supply chain security tools (e.g., Cosign, Kyverno). Experience working in prototype or R&D environments, demo automation, or infrastructure handoff processes. What We Offer Access to a modern tech stack including Kubernetes, Argo CD, Terraform, Prometheus, Vault, Kyverno, GitOps, and OpenTelemetry. A remote-first work environment in Germany, complemented by regular meetups in Berlin and 30 days of vacation. Engagement in mission-driven projects that have a meaningful impact on public safety and defense. Provision of equipment and a home office budget, along with support for professional development.
Robert Bosch Semiconductor Manufacturing Dresden GmbH
Full-time|On-site|Dresden
Take charge of commissioning and operating, as well as evolving, an application landscape for the semiconductor factory of the future.Define and implement operational processes and deployment strategies independently, adhering to modern principles such as DevOps and Site Reliability Engineering (SRE).Oversee change management, reliably implement requirements, assess risks, and produce comprehensive documentation.Proactively work towards achieving SLA targets for availability while managing IT incident management and disaster recovery.Work in an agile environment, participate in retrospectives, and continuously enhance systems and processes.Support the assurance of cost-effectiveness, quality, reliability, and innovation in the IT operations field.
As a Principal Product Manager in Site Reliability Engineering at Delivery Hero, you will take the lead in enhancing our site reliability practices to ensure optimal performance and availability of our platforms. You will collaborate with cross-functional teams to define product strategies, drive initiatives, and implement solutions that enhance user experience and operational efficiency. Your expertise will guide our engineering teams in adopting best practices and innovative technologies to maintain our position as a leader in the online food delivery market.
Why Join Scout24?Scout24 is the proud home of ImmoScout24, Germany's premier platform for real estate. For over 25 years, we have been at the forefront of transforming the real estate market in Germany and Austria. Our mission is to create a digital ecosystem that unites homeowners, seekers, and agents, making the journey to find the perfect home a seamless experience. Your career is as vital as finding the right property; hence, #WorkingatScout24 means you will be part of a vibrant, diverse team of around 1,100 colleagues from 58 nationalities. We celebrate individuality and foster a culture of open-mindedness and authenticity, enabling true learning and personal growth. Mistakes are viewed as opportunities for growth and innovation. Together, we proactively strive for improvement and take responsibility, discussing both successes and challenges with mutual respect because we are #oneteam.If this resonates with you, we would love to welcome you on board! Even if you don't meet every requirement, we encourage you to share how you can contribute to our team. Grow with us! Welcome home!Beyond our outstanding company culture, we offer exceptional benefits that make Scout24 a fantastic workplace!
Join our dynamic team at mlabs as a Senior Site Reliability Engineer. In this pivotal role, you will leverage your expertise to enhance the reliability, performance, and scalability of our systems. Your contributions will play a crucial role in ensuring we deliver exceptional service to our clients.As a Senior Site Reliability Engineer, you will collaborate with various teams to design and implement robust infrastructure solutions. Your ability to troubleshoot and solve complex problems will be vital in maintaining our high availability standards.
Exaring AG operates waipu.tv, a streaming platform serving over a million customers in Germany. The service combines Free TV, Pay TV, NewTV, and Video-on-Demand, with features like recording, restart, and timeshift. Users can watch waipu.tv on smartphones, tablets, smart TVs, FireTV, Apple TV, and the waipu.tv stick. Exaring AG handles the entire platform, from video encoding to delivering smooth streaming experiences. Role overview The Senior Site Reliability Engineer ensures waipu.tv remains stable and reliable as its audience grows. This role focuses on strengthening the infrastructure, refining existing systems, and supporting new technical solutions for the streaming service. What you will do Design, build, and deploy software to improve the stability, scalability, availability, and performance of waipu.tv. Collaborate with the team to resolve production issues and develop automated solutions to prevent future incidents. Lead the architecture and ongoing management of the central Kubernetes platform. Monitor system performance and respond to outages when necessary. Support teams developing microservices for production infrastructure, using CNCF projects such as Kubernetes, Prometheus, and OpenTelemetry. Location This is a remote role (Homeoffice).
Full-time|Hybrid|Berlin, Berlin, Germany; Remote (Europe); Stuttgart, Baden-Württemberg, Germany
Flip develops an AI-powered employee experience platform designed for frontline workers. The company’s mission is to make internal information easily accessible for every employee, wherever they work. Flip is expanding quickly and aims to change how millions of frontline employees stay connected with their organizations. Role overview The Site Reliability Engineer (m/w/d) joins the Platform Squad to keep Flip’s infrastructure fast, resilient, and ready for growth. This role focuses on shaping reliability practices, building internal tools, and fostering a culture where engineering teams can deploy confidently at scale while maintaining high uptime. The position is well-suited for those who enjoy designing high-throughput, highly available systems and want to influence the production operations of a growing SaaS platform. Key responsibilities Enable scaling: Expand and optimize Azure cloud infrastructure and Kubernetes clusters to support Flip’s global growth, prioritizing high throughput and availability. Ensure resilience & security: Design and implement zero-downtime deployments, effective rollback mechanisms, and disaster recovery strategies to keep the platform available at all times. Create observability: Improve the LGTM stack (Loki, Grafana, Tempo, Mimir) so teams have clear insight into system health and performance. Location This position can be based in Berlin or Stuttgart, Germany, or performed remotely from anywhere in Europe.
Join TechBiz Global as we empower our prestigious clients by providing exceptional recruitment services. We are currently on the lookout for a Founding DevOps Engineer (SRE) to become an integral part of our client's team. If you are eager to advance your career in a cutting-edge environment, this opportunity could be perfect for you.Berlin • Cybersecurity & AI Startup • Recently FundedOur client, an innovative cybersecurity startup based in Berlin, is seeking a DevOps Engineer to join as a founding member and contribute to the development of the core security, identity, and enforcement frameworks of a pioneering AI-driven risk management platform.Founded by seasoned cybersecurity professionals with experience in Israeli intelligence, our client is looking for a proactive Founding DevOps Engineer for a hybrid role located in central Berlin. If you have a passion for cybersecurity and AI, excel in dynamic startup settings, and relish the challenge of building sophisticated platforms from the ground up, this is a chance to make a significant impact.This startup is creating a state-of-the-art cyber risk platform designed to help enterprises effectively comprehend, measure, and mitigate identity risks on a large scale. Their mission is to transform intricate identity and security data into clear, actionable insights that Chief Information Security Officers (CISOs) and Chief Technology Officers (CTOs) can rely on. From day one, you will be instrumental in shaping core platform components, influencing how modern enterprises manage risk using cloud-native technologies, AI-driven analytics, and automated enforcement through AI agents.Key ResponsibilitiesDesign, build, and operate the foundational cloud infrastructure for a secure, scalable, production-ready SaaS platform from the outset.Manage AWS environments comprehensively, encompassing networking, IAM, compute, storage, and security parameters.Develop and sustain Infrastructure as Code practices to ensure efficient deployment and management.
Role Overview scalablegmbh is looking for a Senior Cloud Site Reliability Engineer with a focus on network systems. This position is based in Berlin. What You Will Do Maintain and improve the reliability, performance, and scalability of cloud infrastructure. Work closely with engineering teams to optimize network services and resolve technical challenges. Contribute to developing solutions that strengthen network systems. Support a culture of ongoing improvement across the organization. About You Bring expertise in cloud technologies and network systems. Enjoy solving complex problems and collaborating with others. Ready to make an impact in a growing company.
Join Almedia, a pioneering company on a mission to revolutionize marketing by rewarding a community of over 60 million users for their engagement with global brands. Here, you can accelerate your career in an exciting environment aiming to become Germany's next bootstrapped unicorn, recognized as Europe's #3 fastest-growing company in 2025 (FT1000).We are seeking a passionate and skilled Site Reliability Engineer / DevOps to help us maintain the performance and reliability of our high-traffic platform.
Join Tipico as a Site Reliability Engineer and become a key player in enhancing the excitement of sports betting for our customers. You will be part of a dynamic and agile team that thrives on collaboration and innovation. Each day will present new challenges as you develop technical solutions and products that elevate our offerings.Your Responsibilities:Manage production environments by monitoring system availability and overall health.Develop software and systems to optimize platform infrastructure and applications.Enhance the reliability, quality, and speed of our software solutions.Measure and optimize system performance to stay ahead of customer needs.Provide operational support for large-scale distributed applications.Analyze metrics from operating systems and applications for performance tuning.Collaborate with development teams to enhance service delivery.Engage in system design, platform management, and capacity planning.Create sustainable systems through automation.Balance rapid feature development with reliability, adhering to service-level objectives.
Superhuman embraces a dynamic hybrid working model for this position, offering team members the ideal balance of focused work and in-person collaboration that nurtures trust, innovation, and a vibrant team culture.About SuperhumanSuperhuman is at the forefront of AI productivity, empowering individuals to reach their superhuman potential. As the proud home of Grammarly, our suite of applications integrates seamlessly with over 1 million platforms, enhancing productivity through intelligent features. Our offerings include Grammarly's writing assistance, Coda's collaborative spaces, and Go, an AI assistant that proactively provides contextual support. Since our inception in 2009, we have transformed the workflows of more than 40 million users, 50,000 organizations, and 3,000 educational institutions globally. Discover more at superhuman.com.The OpportunityIn pursuit of our ambitious goals, we seek a Site Reliability Engineer (SRE) to strengthen our infrastructure team. This pivotal role involves developing software to enhance the reliability of our backend systems, collaborating closely with engineers, and strategizing for future scalability. You will engage with our existing production engineering teams in the EU as we transition away from the “you build it, you own it” approach.The engineers and researchers at Superhuman are given the freedom to innovate and drive breakthroughs, subsequently influencing our product roadmap. As we expand our interfaces, algorithms, and infrastructure, the complexity of our technical challenges continues to grow. Learn more about our technical endeavors on our technical blog.As an SRE, your responsibilities will include:Scaling our Kubernetes-based control plane that processes billions of events daily.Enhancing our automation systems that respond to workload demands.Deploying machine learning systems company-wide.
Veeva Systems is a pioneering organization focused on transforming the life sciences industry through innovative cloud solutions, enabling companies to accelerate the delivery of therapies to patients. As one of the fastest-growing SaaS enterprises, we achieved over $2 billion in revenue last fiscal year, with significant growth opportunities on the horizon.Our core values—Do the Right Thing, Customer Success, Employee Success, and Speed—define our culture. In 2021, we made history by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere company, we provide you with the flexibility to work from home or in the office, ensuring you thrive in your ideal work environment.Join us in transforming the life sciences landscape, making a meaningful impact on our customers, employees, and the broader community.
Role Overview scalablegmbh is looking for a Senior Cloud Site Reliability Engineer (Network) in München. This position focuses on maintaining the reliability, availability, and performance of cloud-based network systems. The role works closely with teams across the company to design, implement, and refine infrastructure that supports a growing client base. What You Will Do Ensure cloud network systems run reliably and meet performance targets Collaborate with cross-functional teams to design and optimize infrastructure solutions Guide cloud strategy decisions with technical expertise Troubleshoot complex network issues Apply best practices to improve network reliability and operational efficiency Location This role is based in München.
Join redcare-pharmacy as a Senior Site Reliability Engineer in Berlin. We are seeking a talented and experienced individual who can enhance our infrastructure and ensure the reliability and performance of our systems. This role will involve collaboration with development teams to build scalable systems and improve our operational practices.
Jan 29, 2026
Sign in to browse more jobs
Create account — see all 9,944 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.