1 - 20 of 45,562 Jobs

Search for Senior Site Reliability Engineer - APAC

45,562 results

Apply
Ditto logo
Full-time|On-site|APAC

About Ditto:Ditto is revolutionizing data movement at the edge, empowering developers to create resilient, real-time applications irrespective of varying network conditions. Whether in a stadium, on an airplane, or at a remote military base, Ditto’s peer-to-peer synchronization engine guarantees continuous device connectivity and consistent data integrity, e…

Apr 9, 2026
Apply
Shippo logo
Full-time|$100K/yr - $156K/yr|Remote|Remote (United States)

Responsibilities in Shipping & HandlingArchitect, scale, and secure infrastructure to meet evolving business demands, employing fault-tolerant designs, performance testing, profiling, and strategic capacity planning.Develop, implement, and sustain automation, monitoring, and alerting systems, alongside disaster recovery protocols.Promote scalability and maintainability through microservices architecture, decoupling concerns, effective data modeling, job queuing, and application layering.Enhance and oversee our CI/CD pipeline to ensure seamless and secure production deployments via automated testing and verification.Evaluate and confirm system performance and accuracy concerning response times and throughput.Engage in peer reviews and testing, contributing to automated testing suites and participating in design reviews for new features, products, and systems.Partake in an on-call rotation for system support.

Mar 15, 2026
Apply
Cognitiv logo
Full-time|$160K/yr - $210K/yr|Hybrid|Bellevue, WA

Are you prepared to transform the advertising landscape? At Cognitiv, we are not merely another AdTech firm—we are pioneers reshaping media buying with our advanced Deep Learning Advertising Platform. Since our inception in 2015, we have been leveraging state-of-the-art deep learning technologies and data science to redefine how brands engage with their audiences. Our mission is clear: to infuse intelligence into advertising, delivering unmatched precision, relevance, and impact at scale. Our innovative platform provides advertisers with unparalleled flexibility—whether activating Dynamic Deals through their preferred DSP, utilizing our managed service DSP, or tapping into our groundbreaking ContextGPT product. Joining Cognitiv means being at the forefront of AI-driven advertising solutions, leading change, and achieving remarkable growth in a fast-paced industry. We are currently expanding!The RoleWe are seeking a Senior Site Reliability Engineer to enhance our global network of datacenters and elevate service management across Cognitiv. Your primary focus will be on rapidly expanding our hybrid cloud infrastructure. As a growing organization, we strive to adhere to industry best practices. This position requires an experienced engineer who is eager to learn our environment quickly and help shape our long-term service management strategy.This role will be based in our Bellevue, WA office with a hybrid work schedule of 3 days in-office (Monday/Tuesday/Wednesday) and 2 days remote (Thursday/Friday).ResponsibilitiesDesign, implement, and maintain infrastructure across a widening footprint of co-located deployments.Assess existing physical and network architectures to ensure long-term scalability and growth.Collaborate with engineering and product teams to accurately scope projects based on core business requirements.Lead company-wide initiatives to enhance service management surrounding deployments, monitoring, and disaster recovery.Oversee and maintain shared infrastructure within our AWS environment.RequirementsUnderstanding of contemporary datacenter practices with experience in configuring multi-datacenter deployments.Extensive knowledge of AWS infrastructure, networking, and management practices.Demonstrated experience with infrastructure as code and related tools.

Mar 19, 2026
Apply
Unifonic logo
Full-time|Remote|Remote job

Unifonic operates as a remote-first company in the CPaaS sector, providing communication solutions to over 5,000 businesses. With a team of 500, Unifonic supports clients in building stronger customer connections. The Engineering team at Unifonic is responsible for designing, building, and maintaining the systems that power the company’s products. Team members collaborate closely with other departments to ensure technology aligns with customer needs. Creativity and new ideas are encouraged across the group. Role overview The Senior Site Reliability Engineer joins the Production Operations (Live) team. This role centers on ensuring the reliability, scalability, and resilience of Unifonic’s cloud infrastructure and distributed messaging platforms. The SRE team works to keep systems running smoothly at all times and continually seeks ways to improve performance and stability. What you will do Maintain the reliability, uptime, and scalability of key production services around the clock. Participate in the on-call rotation, respond to incidents, troubleshoot live production issues, and lead post-incident reviews. Create and update operational playbooks and escalation paths to help reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR). Monitor service level objectives (SLOs), conduct chaos testing, plan for capacity, and address reliability risks as they arise.

Apr 22, 2026
Apply
Juul Labs logo
Full-time|$158K/yr - $227K/yr|Remote|Remote - United States; United States of America

ABOUT JUUL LABS: At Juul Labs, we are dedicated to revolutionizing the experience of adult smokers by transitioning them away from traditional combustible cigarettes. Our mission is to eliminate their use and prevent underage access to our products. We tackle this global health challenge with a focus on quality, innovation, and research. Supported by prominent technology investors, we aim for excellence not only in our products but also in our talent acquisition. We embrace diversity and are united by our mission. We are seeking the world's best engineers, scientists, designers, product managers, operations experts, and customer service professionals. If you are ready to advance your career with us, we encourage you to explore this opportunity. ROLE OVERVIEW: As a Senior Site Reliability Engineer (SRE), you will take ownership of the operational stability and performance of Juul's hybrid cloud infrastructure (Nutanix, AWS/GCP). Your responsibilities will include leading automation initiatives, ensuring reliability in architecture, and serving as the go-to expert for critical incident escalation to guarantee a scalable and efficient platform. Nutanix Platform Management Responsibilities: Design, deploy, and maintain enterprise-scale Nutanix AHV clusters and manage Prism Central for multi-cluster operations. Exhibit expert-level proficiency with Nutanix CLI (nCLI and acli) for advanced operations and automation. Create automation scripts using Nutanix REST APIs, Python SDK, PowerShell, and Terraform. Manage VM templates, golden images, and standardized deployment catalogs. Design disaster recovery solutions utilizing Leap, Protection Domains, and metro clustering. Implement network micro-segmentation with Nutanix Flow, including RBAC and encryption tactics. Lead Level 3 troubleshooting through advanced diagnostics and log analysis. Configure high availability and optimize performance for critical workloads. Oversee AHV networking with OVS bridges, VLANs, and implement resource reservations. Architect and maintain hybrid cloud solutions across Nutanix HCI, AWS, and GCP environments. Cloud Platform Engineering Responsibilities: Further responsibilities in cloud platform engineering will be communicated during the interview process to ensure alignment with your expertise.

Apr 30, 2026
Apply
Wikimedia Foundation logo
Full-time|Remote|Remote

Summary The Wikimedia Foundation is on the lookout for a talented Senior Site Reliability Engineer to enhance and maintain the infrastructure that powers the world’s most beloved encyclopedia, Wikipedia, serving millions globally. Our Site Reliability Engineering (SRE) team is dedicated to ensuring that our globally recognized top-10 website operates smoothly while innovating to further our mission: to empower everyone to share in the sum of all knowledge. As a member of the SRE team, you will join a diverse and globally distributed group of engineers passionate about exploring, experimenting, and adopting new technologies. We believe in transparency, sharing our documentation, code, and configuration as open source. Our production systems are powered entirely by open-source software, and we encourage you to review our work without any login requirements. If you are intrigued by the challenge of improving the reliability and delivery of one of the Internet’s top websites and thrive in a remote-first environment, we invite you to consider joining us.

Mar 21, 2026
Apply
Comtech LLC logo
Contract|On-site|Seattle

Position: Senior Site Reliability Engineer Location: Seattle, WADuration: 12 monthsInterview: In-person for local candidates or via Phone + SkypeAs a Senior Site Reliability Engineer, you will play a pivotal role in the ongoing maintenance and administration of enterprise-level internet systems. Your primary responsibility will be to diagnose and resolve operational issues, ensuring the seamless functioning of our infrastructure. You will also be tasked with developing tools and scripts to enhance these processes.Collaboration with various teams will be essential to document our enterprise infrastructure and monitoring systems effectively. Additionally, you'll oversee the planning and execution of projects ranging from small to large scale within our Technology teams, reporting directly to your manager. This role demands a high level of technical expertise in both traditional enterprise systems and cutting-edge cloud-native applications.If you share our belief that a simple cup of coffee can transform lives and enhance experiences, we invite you to join us in delivering exceptional services to customers worldwide.

Sep 1, 2017
Apply
Runlayer logo
Full-time|Remote|Remote

AI is revolutionizing the operational landscape for businesses, yet many enterprises find themselves hindered in their efforts to effectively implement AI tools, agents, and workflows. At Runlayer, we are dedicated to dismantling these barriers.Our innovative team has developed AI Actions for OpenAI, delivered Zapier Agents to millions, and launched the first remote MCP server in partnership with Anthropic. With the co-creator of MCP on our cap table, we are establishing the essential platform that enterprises need to leverage AI securely and effectively.Runlayer serves as a unified platform for MCPs, Skills, and Agents. We provide purpose-built security, fine-grained governance, and complete observability, enabling organizations to advance their AI initiatives with confidence. With $11M raised from Khosla Ventures and Felicis, we proudly support clients such as Gusto, Instacart, and Opendoor.As a compact team of 25, primarily engineers, we thrive on rapid deployment and innovation. If you aspire to be at the forefront of AI implementation, now is the time to join us.In the role of Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and scalability of Runlayer's infrastructure as we expand to meet the needs of our enterprise customers across both cloud and on-prem environments.Why You'll Thrive HereImpact: Construct the foundational infrastructure for the enterprise MCP platform, directly facilitating large-scale AI adoption.Excellence: Collaborate closely with founders and a small, experienced engineering team, delivering swiftly in a high-growth setting.Ownership: Take full responsibility for reliability from database performance to incident response and CI/CD pipelines.What You'll DoOversee the reliability and performance of our cloud infrastructure across AWS (ECS, Aurora, CloudWatch) and GCP.Manage and optimize Kubernetes clusters and container orchestration.Lead database reliability engineering efforts, including performance tuning and scaling.Develop and maintain CI/CD pipelines for efficient and secure deployments.Conduct incident response and participate in on-call rotations.Collaborate with product engineers to design scalable and resilient systems.What We're Looking ForProven experience with AWS services including ECS, Aurora, and CloudWatch.Expertise in Kubernetes management and container orchestration.Strong background in database reliability engineering.Solid understanding of CI/CD methodologies and tools.Effective incident response skills and a proactive approach to system reliability.Ability to work collaboratively in a fast-paced environment with a focus on innovation.

Apr 3, 2026
Apply
onebrief logo
Full-time|On-site|Colorado Springs, CO

Join our dynamic team at onebrief as a Senior Site Reliability Engineer in Colorado Springs, where you will play a critical role in enhancing our systems' reliability, performance, and scalability. You'll collaborate with cross-functional teams to implement best practices and ensure the integrity of our services.

Apr 8, 2026
Apply
akuity logo
Full-time|Remote|Remote - US Timezones

Join our dynamic team at akuity as a Senior Site Reliability Engineer, where you'll play a pivotal role in enhancing the reliability and performance of our systems. In this exciting remote position, you will collaborate with cross-functional teams to implement innovative solutions that ensure seamless service delivery.Your expertise will be vital in monitoring system health, optimizing performance, and troubleshooting issues to provide exceptional user experiences. If you are passionate about building scalable and robust infrastructures, we want to hear from you!

Mar 18, 2026
Apply
Drata logo
Full-time|$166.9K/yr - $225.9K/yr|Hybrid|Hybrid - San Francisco

Drata helps organizations demonstrate their commitment to security and integrity. The platform supports companies as they build and maintain trust with users, customers, partners, and prospects. Values Built on Trust: Consistency shapes decisions and actions. Integrity: Choosing to do what is right, every time. Customer-Obsessed: Prioritizing customer needs above all else. Competitive Fire: Striving for higher standards and greater achievements. Diversity: Welcoming different perspectives to encourage creative solutions. Automation First: Pursuing efficiency by saving time and resources wherever possible. How the Team Works Drata blends high standards with a supportive environment focused on growth. Team members are encouraged to own their work, improve continuously, and deliver meaningful results. The company values quick, informed decisions that drive immediate impact, while always keeping the mission and customer needs at the center. The San Francisco-based team uses a hybrid work model. Colleagues collaborate in the office Tuesday through Thursday, focusing on alignment and innovation. Mondays and Fridays offer flexibility for deep work or personal needs. Growth and Culture Drata has expanded to over 600 professionals worldwide, recognized for a culture that values trust, speed, and continuous learning. The environment supports both personal and professional development. See the Speed: CEO Adam Markowitz discusses Drata’s rapid journey to $100M ARR in four years. Hear the Voice of the Team: Employee stories highlight collaboration and growth at Drata.

Apr 27, 2026
Apply
Hashgraph logo
Full-time|Remote|Remote within US time zones

About Hashgraph:Hashgraph is an innovative and rapidly growing software company dedicated to supporting, developing, and maintaining Hedera, an open-source proof-of-stake platform. Hedera is EVM-compatible and designed to cater to the demands of enterprise and web3 applications, focusing on speed, security, stability, and sustainability. The public network of Hedera is governed by leading organizations across 11 sectors and 14 regions, ensuring robust oversight of the decentralized platform's development and direction.About the RoleWe are seeking a Senior Site Reliability Engineer to join the HashSphere engineering team. In this pivotal role, you will assist in designing, building, and integrating essential product features for enterprises utilizing Hiero, our private distributed ledger technology. This greenfield project is at the forefront of decentralized systems and cloud technologies, with a strong emphasis on security, privacy, and scalability.Your expertise in distributed systems engineering, coupled with your software development skills and knowledge of industry-standard SRE and DevOps practices, will be crucial in delivering core platform services. You will contribute to a highly scalable, mission-critical infrastructure product utilized by some of the largest organizations in finance, supply chain, and healthcare sectors.If you possess experience in designing scalable, reliable, and secure distributed system architectures within AWS, GCP, or Azure, and are eager to collaborate with a passionate team to build pioneering technology, this could be the perfect opportunity for you.

Jan 23, 2026
Apply
Crexi logo
Full-time|On-site|Los Angeles, CA

Join Crexi as a Senior Site Reliability Engineer, where you will play a crucial role in maintaining and enhancing our infrastructure. You will be responsible for ensuring our systems are reliable, scalable, and secure. Collaborate with cross-functional teams to implement best practices in site reliability engineering, contribute to incident response, and drive automation initiatives. If you are passionate about optimizing system performance and enhancing user experience, we want to hear from you!

Mar 30, 2026
Apply
Cloudbeds logo
Full-time|Remote|Latin America

What Makes Us Unique At Cloudbeds, we’re not just developing software; we’re revolutionizing the hospitality industry. Our advanced platform empowers properties in over 150 countries, handling billions in bookings each year. From independent hotels to large chains, we assist hoteliers in enhancing operations and elevating their commercial strategies through a unified platform that seamlessly integrates with numerous partners. Moreover, our team operates entirely remotely. Imagine collaborating with innovative minds globally to create AI-driven solutions that address the most pressing challenges faced by hoteliers. Since our inception in 2012, we have distinguished ourselves as the World's Best Hotel PMS Solutions Provider and secured a spot on Deloitte's Technology Fast 500 for 2024, but we are just getting started. How You’ll Make an Impact: As a Senior Site Reliability Engineer, you will be the steward of our platform's reliability and performance, ensuring that millions of hospitality transactions occur smoothly worldwide. You will design and implement scalable AWS cloud solutions that enable ambitious hotels to operate 24/7, all while nurturing a culture of automation, resilience, and continuous improvement across our engineering teams.Our SRE Team:We are a collaborative team that values open discussions and shared responsibility for our infrastructure. You will have abundant opportunities to influence architectural decisions while working with cutting-edge cloud technologies at scale. We believe that the most effective solutions stem from engineers who are empowered to innovate, experiment, and challenge conventional practices.

Mar 12, 2026
Apply
Axon Enterprise, Inc. logo
Full-time|$134.3K/yr - $214.8K/yr|Hybrid|Seattle, Washington, United States

Join Axon and Make a Difference.At Axon, our mission is to protect life. We tackle society's most pressing safety and justice challenges with our innovative ecosystem of devices and cloud software. Collaboration is at the heart of our success; we engage with transparency and empathy, welcoming diverse perspectives from our customers and each other.Life at Axon is dynamic, challenging, and impactful. Here, you’ll take charge and instigate genuine change while evolving in a mission-driven environment that values your contributions.Your ImpactAs a Senior Site Reliability Engineer (SRE) on the APX SRE CloudOps team, you will craft and maintain the cloud infrastructure and automation platforms that are vital for Axon's product engineering teams. You will design solutions for multi-cloud architectures (Azure, AWS), ensure compliance with FedRAMP regulations, and oversee large-scale Kubernetes platforms that support production workloads across various regions. A significant part of your role will involve writing code: developing services, APIs, and internal tools using languages such as Go and Python. Additionally, you will be part of on-call rotations and incident response teams, leveraging your operational expertise to enhance reliability and guide platform investments. This position merges deep software engineering expertise with large-scale cloud architecture and production ownership.Location: This position is based in our Seattle, Atlanta, or Boston offices and follows a hybrid work model. We emphasize in-person collaboration, requiring team members to work on-site from Tuesday to Friday, with the flexibility to work remotely on Mondays unless a workplace accommodation is arranged. We believe that connection fuels innovation, and our office culture is designed to encourage meaningful teamwork, mentorship, and collective success.

Apr 10, 2026
Apply
onebrief logo
Full-time|On-site|Northern Virgina (DC Metro)

We are seeking an experienced Senior Site Reliability Engineer to join our dynamic team at onebrief. This role will require you to leverage your expertise in system reliability, performance optimization, and incident response to enhance our services. You will be instrumental in shaping the reliability and efficiency of our infrastructure, ensuring seamless operations and high availability of our systems.

Apr 8, 2026
Apply
Okta, Inc. logo
Full-time|$147K/yr - $202K/yr|On-site|Bellevue, Washington

About OktaOkta stands as the leader in identity solutions, empowering individuals to securely engage with any technology, on any device, and through any application. Our versatile products, including the Okta Platform and Auth0 Platform, ensure safe access and authentication, placing identity at the forefront of security and business growth.At Okta, we embrace diverse perspectives and experiences. We are not searching for someone who checks all the boxes; rather, we value lifelong learners who can enrich our team with their unique backgrounds.Join us in crafting a future where identity is truly yours.Position Overview:We are looking for a highly skilled Senior Observability Site Reliability Engineer with a focus on Splunk to take ownership and enhance our Splunk ecosystem. In this role, you will go beyond traditional monitoring, creating a comprehensive and scalable Observability Platform that empowers our SRE teams and business stakeholders. You will treat infrastructure as code, leveraging Terraform alongside proficient coding skills in Go, Python, or Ruby to automate deployment across complex distributed systems.Key ResponsibilitiesAutomated Infrastructure: Design, build, and maintain scalable observability infrastructure utilizing tools like Terraform.Splunk Engineering: Enhance the collection, processing, and storage of log data to ensure our Splunk services are highly reliable and low-latency.Incident Response: Engage in on-call rotations and lead post-incident reviews to drive systemic improvements and promote 'observability-driven development.'Automation: Minimize 'toil' by automating the deployment and scaling of observability agents and collectors.

Mar 5, 2026
Apply
Iru logo
Full-time|On-site|Miami

About IruIru is an innovative AI-driven security and IT platform empowering the world's fastest-growing companies to safeguard their users, applications, and devices. Designed for the AI era, Iru integrates identity and access management, endpoint security, and compliance automation—streamlining operations and restoring control to IT and security teams.Supported by top-tier investors in the technology sector—General Catalyst, Tiger Global, Felicis, Greycroft, and First Round Capital—Iru successfully raised $100 million in July 2024 from General Catalyst, achieving a valuation of $850 million. Our clientele includes notable companies such as Notion, Cursor, Lovable, Replit, and Mercor, and we collaborate with industry giants like ServiceNow and AWS. Iru is proud to be recognized in Forbes’ America’s Best Startup Employers 2025 for exceptional employee engagement and satisfaction.Join Our TeamWe are in search of a Senior Site Reliability Engineer who will take charge of incident detection, response, and learning, while fostering robust observability across our services and teams. This pivotal role blends reliability engineering with cross-team facilitation, working closely with our Infrastructure team to enhance their platform-building efforts while maintaining a strong emphasis on operational excellence and quantifiable reliability. In partnership with engineering and platform teams, you will strive to minimize Mean Time to Detect (MTTD) and Mean Time to Recovery (MTTR), ensuring that reliability is measurable, repeatable, and ultimately a shared responsibility.

Apr 10, 2026
Apply
Life.Church logo
Full-time|On-site|Edmond, OK

The Senior Site Reliability Engineer at YouVersion plays a pivotal role in maintaining the integrity, performance, reliability, and cost-effectiveness of our cloud-based infrastructure and the systems that support the applications and platforms operated by Life.Church. This position encompasses overseeing software development, performing regular maintenance, and addressing escalated site reliability incidents. Additionally, the Senior Site Reliability Engineer will engage in researching industry best practices and managing resources effectively for the team.Founded in 2007 by the local church, YouVersion continues to thrive as a ministry of Life.Church. Our mission at Life.Church is to guide individuals towards becoming fully devoted followers of Christ. Our team is dedicated to reaching people globally through innovative technology, with YouVersion being a significant part of that mission. Life.Church operates as a multi-site Christian church both in the United States and through Life.Church Online.We strongly believe that cultivating a daily rhythm of seeking intimacy with God can transform lives. That’s why YouVersion is committed to creating biblically-based experiences that inspire and challenge individuals to deepen their relationship with God. We aspire for everyone in our community to actively pursue their journey of becoming who God intended them to be, fostering a closer connection every day.

Mar 2, 2026
Apply
alembic logo
Full-time|On-site|San Francisco HQ

About the RoleJoin alembic as a Senior Site Reliability Engineer (SRE) and become an integral part of our mission to enhance platform reliability, observability, and operational excellence. In this pivotal role, you will collaborate with engineers and data scientists to architect, automate, and maintain the robust infrastructure that drives our platform, including data pipelines, machine learning workloads, and real-time analytics systems.This hands-on position offers significant visibility across the technology stack and provides you with the opportunity to shape the future of our infrastructure and operations.

Dec 22, 2025

Sign in to browse more jobs

Create account — see all 45,562 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.