Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Manager
Qualifications
We are looking for candidates with a strong background in data engineering and team leadership. Ideal qualifications include:Proven experience managing engineering teams and delivering complex data solutions. Expertise in data architecture, cloud services, and data processing technologies. Strong analytical and problem-solving skills. Excellent communication and interpersonal abilities.
About the job
yuno is looking for an Engineering Manager to join the Data Platform team in London. This position centers on strengthening the company’s data infrastructure, which is essential for collecting, managing, and using data across multiple products and teams.
Role overview
The Engineering Manager will oversee a group of engineers dedicated to the development and performance of yuno’s data platform. The work involves guiding the team through the design and implementation of data systems that are both scalable and reliable.
What you will do
Lead and support a team working on the data platform’s development and ongoing performance.
Direct the design and rollout of data systems built for scale and reliability.
Encourage a collaborative and creative team culture.
Ensure delivery of data solutions that align with business goals.
Location
This role is based in London.
About yuno
yuno is a forward-thinking technology company based in London, dedicated to revolutionizing the way organizations manage and utilize data. We pride ourselves on our innovative approach and commitment to excellence, fostering an environment where creativity thrives.
Similar jobs
1 - 20 of 5,798 Jobs
Search for Engineering Manager Platform Reliability
At Databricks, we are dedicated to empowering data teams to tackle the world's most pressing challenges — from transforming transportation to accelerating medical innovations. Our mission is to create and maintain the premier data and AI infrastructure platform, enabling our clients to leverage profound data insights to enhance their operations. Established by engineers with a customer-centric approach, we enthusiastically seize every opportunity to address technical hurdles, whether that involves designing next-generation UI/UX for data interaction or scaling our services across millions of virtual machines. We're just getting started. The Lakebase Platform Reliability team operates across a diverse array of stacks, systems, and stakeholders. This includes AI-driven tools and workflows for customer management, real-time incident observability, and systems that support compliance through monitoring and auditing, as well as customer-facing operational APIs and maintenance workflows. As part of this team, you'll play a vital role in our overarching platform mission: to develop robust resource management infrastructure, dependable distributed services, and internal tools that empower Databricks engineers to operate seamlessly across various cloud environments.
Full-time|$120K/yr - $120K/yr|On-site|London, Montreal, New York, Singapore
Position Overview:Join Squarepoint Capital as an Ultra Low Latency Platform Engineer, where you will play a pivotal role in enhancing our global colocation (COLO) infrastructure, encompassing over 400 servers across 30 locations worldwide. In this dynamic position, you will lead project delivery, manage support escalations, and oversee monitoring, automation, security, documentation, and capacity management for our low latency systems. Collaborate with various stakeholders, including business partners, application owners, clients, vendors, and internal teams such as SRE, Network, Application Support, Application Development, and Quantitative Analysts, to deliver comprehensive end-to-end solutions promptly.
About YouLendYouLend is an innovative and swiftly expanding FinTech firm, recognized as the leading embedded financing platform for top-tier e-commerce platforms, technology companies, and Payment Service Providers globally. Our advanced software platform empowers partners to enhance their value propositions by offering customizable financing solutions under their own brand, allowing them to serve their merchants without any capital risks.Backed by EQT, a prominent Private Equity firm, our company has experienced remarkable growth, boasting a +100% year-over-year increase since 2020. Our headquarters are in London, UK, with a presence in various European countries and the United States, supporting esteemed partners such as eBay, Amazon, Just Eat, Shopify, and Stripe.Role OverviewAs we establish a premier Observability function, we seek a passionate individual dedicated to uptime, insightful alerts, and sophisticated dashboards. If you have experience with on-call duties, managing alert noise, or debugging elusive issues across microservices during off-hours, we want you on our team!This position transcends a typical “Platform Engineer” role; you will be intensely focused on observability, system reliability, and empowering developers. You will collaborate closely with teams to understand not just when failures occur, but also why.Key Responsibilities:Designing and scaling on-call systems that engineers will appreciate being a part of.Enhancing Datadog monitoring, alerting, dashboards, and log pipelines for Kubernetes environments.Defining and managing SLOs, SLIs, and error budgets, ensuring teams adhere to them.Developing scorecards and software catalogs so engineers can easily track system health and ownership.Mentoring and enabling development teams to take charge of their own observability, alerts, and incident responses.Implementing chaos engineering practices to intentionally identify weaknesses.Fostering a culture of reliability through incident reviews, shared learnings, and transparency.Ideal Candidate Qualifications:Proven production experience with observability tools, particularly Datadog, in cloud-native settings.Experience establishing monitoring and alerting across Kubernetes services.Demonstrated ability in building or scaling on-call systems within startup or large-scale environments.Expertise in minimizing alert fatigue and a passion for effective monitoring.
Blacklane is hiring an Engineering Manager - Platform to join the London office. This leadership role sits at the intersection of infrastructure and developer experience, with a focus on turning a suite of internal tools into a unified platform product. The team is growing, and this position plays a key part in shaping both processes and culture to help product teams deliver more effectively. What you will do Lead the team in integrating platform tools into a single, cohesive internal product. Collaborate closely with the Staff Engineer to set direction, define standards, and manage project timelines. Make decisions that prioritize meaningful business outcomes over simply closing tickets. Clarify ambiguous requirements and turn them into actionable plans. Mentor engineers, support hiring efforts, and foster a collaborative, high-performing team environment. Encourage continuous learning and improvement within the Platform unit. Champion high standards in developer experience, infrastructure, and site reliability engineering. Maintain open communication with other Engineering Managers to coordinate expectations and dependencies. Requirements Ideal candidates bring experience in Platform, Backend, or Data Engineering, and enjoy solving complex problems through to completion. A commitment to learning and team development is important. Initiative: Proactively addresses important issues and builds strong business cases for solutions. Execution: Delivers results and navigates challenges with a strategic mindset. Location This role is based in London.
What You'll Accomplish:As a Senior Data Reliability Engineer, you will spearhead the integration of Site Reliability Engineering (SRE) across all engineering practices. Your leadership will ensure that every engineer and team is dedicated to crafting software that is not only resilient but also exceptionally reliable. You will collaborate with a diverse, cross-functional team of subject matter experts and on-call engineers, focused on maintaining high performance of our platform around the clock.Overseeing a comprehensive suite of products, you will be responsible for the reliability of enterprise-grade applications that process thousands of queries per second. Elliptic is acclaimed for its extensive and dependable datasets, and your role will be pivotal in establishing a market-leading infrastructure for data quality and governance. This involves creating the processes, culture, and frameworks that will enhance observability, data quality, lineage, and remediation, forming a crucial backbone of our data and intelligence platform.Your Responsibilities:This role spans multiple teams, and you will receive full support from leadership and engineering while showcasing exemplary standards. Your main tasks will include:Promote the principles of SRE and DRE throughout the engineering teams.Lead the development of a data quality framework that assures our clients of the accuracy of our data and supports marketing and revenue initiatives.Define and manage the on-call process within the SRE function:Quickly gain an in-depth understanding of our systems.Lead incident management.Conduct post-incident reviews.Ensure timely completion of follow-up actions.Assess and enhance our existing end-to-end on-call processes.Participate in the on-call rotation, approximately every 4 to 5 weeks, ensuring 24/7 coverage.Evaluate, manage, and improve our current monitoring, alerting, paging, and documentation solutions.Provide reports on system uptime, availability, and performance across our product range.Draft post-mortem reports for both internal and external stakeholders.Represent the SRE and DRE functions during discussions with top-tier enterprise financial institutions.
yuno is looking for an Engineering Manager to join the Data Platform team in London. This position centers on strengthening the company’s data infrastructure, which is essential for collecting, managing, and using data across multiple products and teams. Role overview The Engineering Manager will oversee a group of engineers dedicated to the development and performance of yuno’s data platform. The work involves guiding the team through the design and implementation of data systems that are both scalable and reliable. What you will do Lead and support a team working on the data platform’s development and ongoing performance. Direct the design and rollout of data systems built for scale and reliability. Encourage a collaborative and creative team culture. Ensure delivery of data solutions that align with business goals. Location This role is based in London.
Join Storio Group as an Engineering ManagerRole Overview:We are on the lookout for a skilled Engineering Manager specializing in Data to lead our dynamic London-based data team. Your primary focus will be to develop and enhance a platform that fosters a decentralized data adoption model throughout our organization.At Storio, we prioritize delivering reliable and actionable data that fulfills business requirements. We’re eager for you to build and manage a high-performing team dedicated to this objective. If you have a background in software engineering management with a focus on data, we would love to hear from you!Our data and AI team is expanding across the UK and the Netherlands, serving as the backbone for various data-driven consumers and platforms within Marketing, Finance, Operations, and Product teams.Our AI photo services are pivotal to enhancing customer experience, and we are broadening our efforts towards decentralized machine learning adoption within the business. Following a series of mergers, our team has unified and is transitioning from a phase of simplifying legacy systems to a stage of consolidation and growth.Our Technology Stack:• Cloud Data Warehouse - Snowflake• AWS Data Solutions - Kinesis, SNS, SQS, S3, ECS, Lambda• Data Governance & Quality - Collate & Monte Carlo• Infrastructure as Code - Terraform• Data Integration & Transformation - Python, DBT, Fivetran, Airflow• CI/CD - Github Actions / Jenkins
About Neo4j Neo4j builds a graph intelligence platform used by 84 of the Fortune 100 and supported by the world’s largest graph community. The platform powers knowledge graphs for AI, delivers reliable graph capabilities across cloud environments, and integrates with a wide range of systems. Neo4j’s technology is designed for precision, accountability, and governance, helping organizations turn data into actionable insights for intelligent applications and AI systems. Engineered for seamless operation in any cloud, Neo4j supports dynamic, personalized, and autonomous AI solutions. The focus is on delivering swift results, contextual knowledge, and solutions that improve both customer and employee experiences. Our Vision Neo4j’s mission is to help the world understand data. As business and society become more interconnected, Neo4j’s technology enables organizations to find and understand relationships within their data. The company pioneered the graph database category and continues to lead in helping teams innovate and stay competitive. About the Site Reliability Engineering Team The Site Reliability Engineering (SRE) team supports Neo4j’s Database as a Service (DBaaS) product, Neo4j Aura. Aura operates globally across all major cloud providers, running hundreds of Kubernetes clusters and managing thousands of Neo4j instances in production. This team is redefining SRE within Neo4j Aura. Rather than simply reacting to incidents, the SRE group empowers teams to design for reliability from the start. The work centers on building tools, practices, and a culture that embed SRE principles into the foundation of Aura’s operations. Collaboration with product teams and a commitment to resilience and engineering excellence are central to the team’s approach. What You Will Do Automate for insight and scale: Build systems that enable fast, safe, and scalable troubleshooting across thousands of Neo4j instances. This includes developing internal tools that provide actionable insights. Location London
At Orgvue, we are at the forefront of organizational design and planning software, harnessing the transformative power of data visualization and modeling to help organizations become more adaptable and high-performing. Our platform empowers HR, finance, and business leaders to make swift, informed workforce decisions in an ever-evolving landscape.Trusted by some of the world's largest enterprises and renowned management consulting firms, Orgvue enables organizations to visualize and proactively shape their futures. Headquartered in London, we also have offices in Philadelphia, The Hague, Toronto, and Sydney.We are currently on the lookout for a Principal Site Reliability Engineer to join our team as a senior technical leader specializing in scaling and fortifying our AWS and Kubernetes-based infrastructure.Role OverviewIn this pivotal role, you will collaborate with product, platform, and operations teams to ensure our systems are reliable, observable, and resilient, even at scale. This position marries hands-on technical proficiency with strategic foresight, enabling us to cultivate a world-class reliability culture and a strong engineering framework for growth. We seek an individual with robust technical skills, exceptional communication abilities, and a passion for cross-team collaboration.Key ResponsibilitiesEstablish and uphold SLOs, SLIs, and error budgets across vital servicesDesign and execute a comprehensive cloud infrastructure and tooling strategyElevate SRE practices organization-wideImplement effective observability metrics, logs, and traces using our observability toolsLead the team in creating automated, self-healing systemsManage and refine our incident response protocols, including on-call practices and a post-mortem cultureMentor engineers throughout the organization on reliability best practices, operational readiness, and scalable infrastructureDrive Infrastructure as Code (IaC) initiatives using Terraform, Kubernetes, CloudFormation, and GitOps methodologiesWork closely with security, DevOps, and software teams to guarantee compliance, scalability, and operational excellenceAssess and introduce tools, patterns, and practices that enhance the performance and reliability of our SaaS platformQualificationsProven experience leading SRE transformationsExtensive hands-on expertise with Kubernetes (EKS preferred) in production settingsStrong proficiency with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)Expertise in Infrastructure as Code utilizing tools such as Terraform, with familiarity in GitOps workflowsSolid background in observability: metrics, visualization, logging, and tracingUnderst...
Spotify is looking for an Engineering Manager to guide the Content Platform team in London. This position leads a group of engineers focused on building and improving content management systems that support Spotify’s user experience at scale. Role overview The Engineering Manager sets technical direction for the team, making sure projects move forward smoothly and efficiently. This role involves balancing hands-on technical leadership with people management, ensuring the team delivers reliable systems that meet Spotify’s standards. Team leadership Inspire and mentor engineers, encouraging growth and collaboration Drive a culture of continuous improvement within the team Oversee project execution and technical decisions What you will work on Lead the development of scalable content management solutions Enhance systems that impact user experiences across Spotify’s platform
Role overview The Forward Deployed Reliability Engineer at Palantir Technologies in London plays a key role in supporting the reliability and performance of Palantir's software as it becomes part of client operations. This position centers on ensuring that solutions remain stable and effective after deployment. What you will do Partner with clients to help integrate Palantir's technology into their daily workflows. Troubleshoot and resolve complex technical challenges to keep systems stable. Work to optimize performance and apply established reliability engineering practices. Collaborate with teams across disciplines to enhance system functionality and deliver results for clients.
Full-time|£40K/yr - £60K/yr|Hybrid|Bristol, England, United Kingdom; Edinburgh, Scotland, United Kingdom; London, England, United Kingdom
Join our dynamic Release Engineering team at Kaluza as a Site Reliability Engineer. In this pivotal role, you will play a crucial part in enhancing our software development lifecycle by developing innovative engineering solutions that empower our software teams to deploy high-quality code efficiently. Your efforts will significantly boost engineering productivity through the optimization of testing, deployment, and release processes across all Kaluza engineering teams.
Checkout.com supports online payments for major brands including eBay, ASOS, Klarna, Uber Eats, and Sony, processing billions of transactions every year. With a presence in 19 offices across six continents and headquarters in London, the company emphasizes high performance, continuous improvement, and innovation. Employees contribute directly to shaping the future of fintech. Role overview The Senior Engineering Manager - Data Platform leads the Data and AI Platform Team. This group provides the systems and tools that enable products, merchants, and internal teams to make effective use of data and AI. The goal is to maximize time spent on innovation and solving business problems, while reducing the complexity of technical implementation, deployment, and monitoring. As Checkout.com grows, the Data Platform must scale to support hundreds of teams and manage petabyte-scale data volumes. What you will do Guide the team in designing technical architecture and building user-focused systems and tools. Oversee technical delivery and operational reliability, driving process automation and maintaining high engineering standards. Ensure service levels (SLAs/SLOs) are met with minimal manual intervention. Manage a team of Junior and Senior Data Engineers, providing hands-on support when needed. Foster an inclusive culture and mentor team members in both technical and interpersonal development. Lead quarterly execution of roadmap deliverables, ensuring alignment with business objectives. Location This position is based in London.
Join our dynamic team as a Senior Site Reliability Engineer at Bumble Inc., where your expertise in Linux and system-level operations will be pivotal in managing complex production environments. We seek a proactive engineer capable of independently troubleshooting incidents, leading post-incident recovery efforts, and implementing enhancements to boost overall system stability, performance, and observability. This role is ideal for hands-on SREs with a solid foundation in Linux infrastructure and third-party system operations, focusing on optimizing large-scale environments of over 5,000 hosts utilizing technologies such as Kafka, Redis, and Kubernetes. Please note, this position centers on operational excellence rather than application development, requiring deep technical acumen and advanced troubleshooting capabilities.
About Gigs Gigs builds a platform that lets tech companies add global mobile connectivity to their products. The goal: make integrating telecom as simple as integrating payments. By automating provisioning and removing telecom hurdles, Gigs helps businesses from fintechs to HR platforms offer mobile services without the usual complexity. The team numbers around 100 people, spread across the US and Europe. Gigs has raised close to $100 million from investors including Ribbit Capital, Google, and Y Combinator. The company brings together engineers and product leaders from places like Stripe, Airbnb, and Shopify, all focused on solving technical and regulatory challenges in telecommunications. Curiosity, creativity, and a drive to shape the future of telecom matter here. Gigs welcomes people who want to tackle tough problems and make an impact. Core Values Speed: Move quickly and deliver. Ambitious deadlines are the norm, and each week counts. Ownership: Spot a problem? Step up and fix it without waiting for approval. Customer Obsession: The success of customers is central to every decision. Ambiguity: Work often involves making decisions with incomplete information. Judgment and instinct matter. First Principles: Question assumptions and dig into the reasons behind how things are done. Role Overview: Senior Platform Engineer (London) This role sits at the heart of Gigs’ technical operations. Senior Platform Engineers design, build, and maintain the infrastructure and developer tools that power the company’s connectivity and payments platform. The work enables product teams to ship faster and more reliably, supporting growth at scale. The Foundation team owns the core systems every Gigs engineer depends on: CI/CD pipelines, cloud infrastructure, observability, deployment tools, and shared platform services. Taking on complex infrastructure challenges is a key part of the job, with a focus on keeping systems efficient and dependable.
Full-time|£92.8K/yr - £115K/yr|Hybrid|Bristol, England, United Kingdom; Edinburgh, Scotland, United Kingdom; London, England, United Kingdom
Job Title: Engineering Manager - Data Platform & Analytics Engineering Location: London, Bristol or Edinburgh (Hybrid options available) Salary: £92,800 - £115,000 Reporting to: Head of Data Science & Products Eligibility: Must have the right to work in the UK; visa sponsorship is currently not available. Kaluza is at the forefront of reshaping the energy landscape through our Energy Intelligence Platform. We empower energy companies to navigate the complexities of modern energy demands while driving the transition to a sustainable, electrified future. Utilizing cutting-edge Data, AI technologies, and real-time decision-making, we convert energy challenges into growth opportunities for our partners. Our innovative approach combines predictive algorithms with user-centric design, ensuring that clean energy remains reliable and affordable. With a diverse team spanning Europe, North America, Asia, and Australia, and a strategic partnership with Mitsubishi Corporation in Japan, we are proud to support industry leaders such as OVO, AGL, and ENGIE, along with pioneering companies like Volvo and Volkswagen. Join the Kaluza Data Community: Data is pivotal to our mission, and we seek curious minds who can transform data into actionable insights and strategies. You will be instrumental in enhancing the Kaluza platform, a pioneering technology designed to revolutionize energy retail operations globally, with a focus on regional adaptability and decarbonization efforts.
About WheelyWheely is revolutionizing premium transportation in major cities across Europe, the United States, and the Middle East. We seamlessly integrate cutting-edge technology with the artistry of five-star chauffeuring to provide an unparalleled experience that has earned the trust of over 100,000 active riders and 1,200 corporate clients.As a profitable and rapidly growing scale-up, we have raised $43M and surpassed $100M in annual revenue. Following our recent launch in New York City, we are swiftly expanding across the US and EMEA. If you take pride in your craft and are eager to contribute to our next phase of growth, we invite you to connect with us.Our infrastructure has been rebuilt almost from the ground up over the past few years, and we are now seeking to further expand our infrastructure team.As a valued member of our team, you will focus on minimizing incidents related to availability, performance, and security. You will accelerate the delivery of new features to customers by building flexible, highly available, and secure infrastructure, ensuring a smooth journey for every customer.
Location: London, Waterloo (Hybrid, 4 days in-office - Wednesday is our designated work from home day, though you are welcome to join us in the office on Wednesdays if you prefer)At getground, we are revolutionizing one of the world's most significant asset classes: property. With over £2 billion in assets on our platform and a community of more than 30,000 users across 70 countries, we are shaping the future of asset ownership and tackling wealth inequality.Our innovative product streamlines property investing from start to finish, making real estate investment accessible to everyone.Your Key Responsibilities:Collaborating within cross-functional product teams to transition infrastructure and reliability initiatives from concept to live deployment.Thriving in a dynamic environment where autonomy and ownership are fundamental to our operations.Developing and sustaining a robust, scalable infrastructure within our GCP cloud ecosystem. Utilizing Kubernetes, Terraform, Cloudflare, and cutting-edge observability tools to ensure seamless platform functionality.Working closely with engineering teams to formulate CI/CD pipelines, enhance deployment methodologies, and advocate for reliability as a core engineering principle.Contributing to the establishment of SRE practices for a rapidly growing fintech platform. Mentoring fellow engineers as we expand our teams and influence.Your Day-to-Day Activities:Designing, implementing, and maintaining cloud infrastructure on Google Cloud Platform (GCP), ensuring it meets scalability, reliability, and security standards.Taking ownership of our Kubernetes clusters and containerization strategy, including Docker image optimization, cluster management, and deployment orchestration.Creating and optimizing Infrastructure as Code using Terraform, producing modular, testable, and well-documented configurations that adapt to our rapid growth.Managing and enhancing our Cloudflare infrastructure, including Workers for edge computing, DNS, CDN, security policies, and performance optimization.Implementing AI-powered product features in isolated and secure serverless environments.Establishing comprehensive monitoring and observability with Prometheus and Grafana, defining SLIs/SLOs, and proactively identifying potential issues before they affect users.Designing and maintaining CI/CD pipelines with appropriate quality gates, testing strategies, and deployment methodologies (blue-green, canary) to facilitate rapid deployments.
About the Role Saturn is looking for a Platform Engineer in London. This role focuses on building and improving the platforms that power our products. The work centers on designing systems that scale, run efficiently, and support a growing user base. What You Will Do Develop and optimize core platform components Work closely with other engineers to deliver reliable systems Contribute to the design of scalable solutions that handle increasing demand Team and Collaboration Platform Engineers at Saturn collaborate with a skilled engineering team. The group values practical solutions and a shared commitment to quality.
Join our dynamic Systems Engineering team as a pivotal and trusted DevOps Engineer / Site Reliability Engineer. Collaborating closely with software engineers, you will design and implement mission-critical services and systems. Your role will involve managing infrastructure and services at scale, employing a diverse array of cutting-edge technologies that support our high-traffic, real-time Freelancer.com marketplace as well as various other business products deployed on Amazon Web Services. Our technology stack includes Nginx, MySQL, Redis, ElasticSearch, RabbitMQ, Consul, Docker, and Kubernetes. We aim to build highly resilient, dynamically scaling, self-healing systems by automating and monitoring all processes using tools such as Terraform, Puppet, Prometheus, Grafana, Kibana, and Jenkins.
Dec 3, 2025
Sign in to browse more jobs
Create account — see all 5,798 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.