1 - 20 of 64,128 Jobs

Search for Senior Site Reliability Engineer at Hashgraph | Remote

64,128 results

Apply
Hashgraph logo
Full-time|Remote|Remote within US time zones

About Hashgraph:Hashgraph is an innovative and rapidly growing software company dedicated to supporting, developing, and maintaining Hedera, an open-source proof-of-stake platform. Hedera is EVM-compatible and designed to cater to the demands of enterprise and web3 applications, focusing on speed, security, stability, and sustainability. The public network o…

Jan 23, 2026
Apply
Wikimedia Foundation logo
Full-time|Remote|Remote

Summary The Wikimedia Foundation is on the lookout for a talented Senior Site Reliability Engineer to enhance and maintain the infrastructure that powers the world’s most beloved encyclopedia, Wikipedia, serving millions globally. Our Site Reliability Engineering (SRE) team is dedicated to ensuring that our globally recognized top-10 website operates smoothly while innovating to further our mission: to empower everyone to share in the sum of all knowledge. As a member of the SRE team, you will join a diverse and globally distributed group of engineers passionate about exploring, experimenting, and adopting new technologies. We believe in transparency, sharing our documentation, code, and configuration as open source. Our production systems are powered entirely by open-source software, and we encourage you to review our work without any login requirements. If you are intrigued by the challenge of improving the reliability and delivery of one of the Internet’s top websites and thrive in a remote-first environment, we invite you to consider joining us.

Mar 21, 2026
Apply
Shippo logo
Full-time|$100K/yr - $156K/yr|Remote|Remote (United States)

Responsibilities in Shipping & HandlingArchitect, scale, and secure infrastructure to meet evolving business demands, employing fault-tolerant designs, performance testing, profiling, and strategic capacity planning.Develop, implement, and sustain automation, monitoring, and alerting systems, alongside disaster recovery protocols.Promote scalability and maintainability through microservices architecture, decoupling concerns, effective data modeling, job queuing, and application layering.Enhance and oversee our CI/CD pipeline to ensure seamless and secure production deployments via automated testing and verification.Evaluate and confirm system performance and accuracy concerning response times and throughput.Engage in peer reviews and testing, contributing to automated testing suites and participating in design reviews for new features, products, and systems.Partake in an on-call rotation for system support.

Mar 15, 2026
Apply
Runlayer logo
Full-time|Remote|Remote

AI is revolutionizing the operational landscape for businesses, yet many enterprises find themselves hindered in their efforts to effectively implement AI tools, agents, and workflows. At Runlayer, we are dedicated to dismantling these barriers.Our innovative team has developed AI Actions for OpenAI, delivered Zapier Agents to millions, and launched the first remote MCP server in partnership with Anthropic. With the co-creator of MCP on our cap table, we are establishing the essential platform that enterprises need to leverage AI securely and effectively.Runlayer serves as a unified platform for MCPs, Skills, and Agents. We provide purpose-built security, fine-grained governance, and complete observability, enabling organizations to advance their AI initiatives with confidence. With $11M raised from Khosla Ventures and Felicis, we proudly support clients such as Gusto, Instacart, and Opendoor.As a compact team of 25, primarily engineers, we thrive on rapid deployment and innovation. If you aspire to be at the forefront of AI implementation, now is the time to join us.In the role of Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and scalability of Runlayer's infrastructure as we expand to meet the needs of our enterprise customers across both cloud and on-prem environments.Why You'll Thrive HereImpact: Construct the foundational infrastructure for the enterprise MCP platform, directly facilitating large-scale AI adoption.Excellence: Collaborate closely with founders and a small, experienced engineering team, delivering swiftly in a high-growth setting.Ownership: Take full responsibility for reliability from database performance to incident response and CI/CD pipelines.What You'll DoOversee the reliability and performance of our cloud infrastructure across AWS (ECS, Aurora, CloudWatch) and GCP.Manage and optimize Kubernetes clusters and container orchestration.Lead database reliability engineering efforts, including performance tuning and scaling.Develop and maintain CI/CD pipelines for efficient and secure deployments.Conduct incident response and participate in on-call rotations.Collaborate with product engineers to design scalable and resilient systems.What We're Looking ForProven experience with AWS services including ECS, Aurora, and CloudWatch.Expertise in Kubernetes management and container orchestration.Strong background in database reliability engineering.Solid understanding of CI/CD methodologies and tools.Effective incident response skills and a proactive approach to system reliability.Ability to work collaboratively in a fast-paced environment with a focus on innovation.

Apr 3, 2026
Apply
Unifonic logo
Full-time|Remote|Remote job

Unifonic operates as a remote-first company in the CPaaS sector, providing communication solutions to over 5,000 businesses. With a team of 500, Unifonic supports clients in building stronger customer connections. The Engineering team at Unifonic is responsible for designing, building, and maintaining the systems that power the company’s products. Team members collaborate closely with other departments to ensure technology aligns with customer needs. Creativity and new ideas are encouraged across the group. Role overview The Senior Site Reliability Engineer joins the Production Operations (Live) team. This role centers on ensuring the reliability, scalability, and resilience of Unifonic’s cloud infrastructure and distributed messaging platforms. The SRE team works to keep systems running smoothly at all times and continually seeks ways to improve performance and stability. What you will do Maintain the reliability, uptime, and scalability of key production services around the clock. Participate in the on-call rotation, respond to incidents, troubleshoot live production issues, and lead post-incident reviews. Create and update operational playbooks and escalation paths to help reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR). Monitor service level objectives (SLOs), conduct chaos testing, plan for capacity, and address reliability risks as they arise.

Apr 22, 2026
Apply
akuity logo
Full-time|Remote|Remote - US Timezones

Join our dynamic team at akuity as a Senior Site Reliability Engineer, where you'll play a pivotal role in enhancing the reliability and performance of our systems. In this exciting remote position, you will collaborate with cross-functional teams to implement innovative solutions that ensure seamless service delivery.Your expertise will be vital in monitoring system health, optimizing performance, and troubleshooting issues to provide exceptional user experiences. If you are passionate about building scalable and robust infrastructures, we want to hear from you!

Mar 18, 2026
Apply
Juul Labs logo
Full-time|$158K/yr - $227K/yr|Remote|Remote - United States; United States of America

ABOUT JUUL LABS: At Juul Labs, we are dedicated to revolutionizing the experience of adult smokers by transitioning them away from traditional combustible cigarettes. Our mission is to eliminate their use and prevent underage access to our products. We tackle this global health challenge with a focus on quality, innovation, and research. Supported by prominent technology investors, we aim for excellence not only in our products but also in our talent acquisition. We embrace diversity and are united by our mission. We are seeking the world's best engineers, scientists, designers, product managers, operations experts, and customer service professionals. If you are ready to advance your career with us, we encourage you to explore this opportunity. ROLE OVERVIEW: As a Senior Site Reliability Engineer (SRE), you will take ownership of the operational stability and performance of Juul's hybrid cloud infrastructure (Nutanix, AWS/GCP). Your responsibilities will include leading automation initiatives, ensuring reliability in architecture, and serving as the go-to expert for critical incident escalation to guarantee a scalable and efficient platform. Nutanix Platform Management Responsibilities: Design, deploy, and maintain enterprise-scale Nutanix AHV clusters and manage Prism Central for multi-cluster operations. Exhibit expert-level proficiency with Nutanix CLI (nCLI and acli) for advanced operations and automation. Create automation scripts using Nutanix REST APIs, Python SDK, PowerShell, and Terraform. Manage VM templates, golden images, and standardized deployment catalogs. Design disaster recovery solutions utilizing Leap, Protection Domains, and metro clustering. Implement network micro-segmentation with Nutanix Flow, including RBAC and encryption tactics. Lead Level 3 troubleshooting through advanced diagnostics and log analysis. Configure high availability and optimize performance for critical workloads. Oversee AHV networking with OVS bridges, VLANs, and implement resource reservations. Architect and maintain hybrid cloud solutions across Nutanix HCI, AWS, and GCP environments. Cloud Platform Engineering Responsibilities: Further responsibilities in cloud platform engineering will be communicated during the interview process to ensure alignment with your expertise.

Apr 30, 2026
Apply
hashgraph logo
Full-time|Remote|Remote within US

Role overview hashgraph is hiring a Senior DevOps Engineer to strengthen infrastructure and deployment practices. This fully remote position (US-based) focuses on building and refining CI/CD pipelines, automating workflows, and supporting system reliability. What you will do Work closely with teams across the company to design and maintain CI/CD pipelines Automate operational workflows to improve efficiency Help ensure the reliability and stability of systems in production Who we’re looking for Experience with cloud technologies and automation tools Strong interest in optimizing operational processes Comfortable collaborating with cross-functional teams

Apr 15, 2026
Apply
Cognitiv logo
Full-time|$160K/yr - $210K/yr|Hybrid|Bellevue, WA

Are you prepared to transform the advertising landscape? At Cognitiv, we are not merely another AdTech firm—we are pioneers reshaping media buying with our advanced Deep Learning Advertising Platform. Since our inception in 2015, we have been leveraging state-of-the-art deep learning technologies and data science to redefine how brands engage with their audiences. Our mission is clear: to infuse intelligence into advertising, delivering unmatched precision, relevance, and impact at scale. Our innovative platform provides advertisers with unparalleled flexibility—whether activating Dynamic Deals through their preferred DSP, utilizing our managed service DSP, or tapping into our groundbreaking ContextGPT product. Joining Cognitiv means being at the forefront of AI-driven advertising solutions, leading change, and achieving remarkable growth in a fast-paced industry. We are currently expanding!The RoleWe are seeking a Senior Site Reliability Engineer to enhance our global network of datacenters and elevate service management across Cognitiv. Your primary focus will be on rapidly expanding our hybrid cloud infrastructure. As a growing organization, we strive to adhere to industry best practices. This position requires an experienced engineer who is eager to learn our environment quickly and help shape our long-term service management strategy.This role will be based in our Bellevue, WA office with a hybrid work schedule of 3 days in-office (Monday/Tuesday/Wednesday) and 2 days remote (Thursday/Friday).ResponsibilitiesDesign, implement, and maintain infrastructure across a widening footprint of co-located deployments.Assess existing physical and network architectures to ensure long-term scalability and growth.Collaborate with engineering and product teams to accurately scope projects based on core business requirements.Lead company-wide initiatives to enhance service management surrounding deployments, monitoring, and disaster recovery.Oversee and maintain shared infrastructure within our AWS environment.RequirementsUnderstanding of contemporary datacenter practices with experience in configuring multi-datacenter deployments.Extensive knowledge of AWS infrastructure, networking, and management practices.Demonstrated experience with infrastructure as code and related tools.

Mar 19, 2026
Apply
Hashgraph logo
Full-time|Remote|Remote within New York City

Hashgraph is looking for a Sales Director with a focus on financial services. This remote role is based in New York City and centers on leading sales efforts within the financial sector. The position emphasizes building strong relationships with key stakeholders, increasing revenue, and presenting Hashgraph's solutions to meet client needs. Key Responsibilities Lead sales initiatives aimed at the financial services industry Develop and maintain connections with decision-makers and stakeholders Collaborate with cross-functional teams to refine and improve service offerings Drive strategies that expand Hashgraph’s presence in the financial market Ensure client satisfaction and align solutions with business objectives Collaboration and Influence This position works closely with teams throughout the company to strengthen offerings and deliver value to clients. The Sales Director will help shape sales strategy and support Hashgraph’s continued growth in the financial sector.

Apr 20, 2026
Apply
AbbVie Inc. logo
Full-time|Remote|Mettawa

Join AbbVie, a global leader in biopharmaceutical innovation, as a Senior Site Reliability Engineer. In this role, you will be instrumental in enhancing our cloud infrastructure, ensuring optimal performance and reliability of our applications. Collaborate with cross-functional teams to design, develop, and implement solutions that support our mission to improve lives.

Apr 30, 2026
Apply
circleso logo
Full-time|Remote|Remote

Join circleso as a Senior Site Reliability Engineer and be at the forefront of ensuring the reliability, availability, and performance of our cloud-based services. You will work closely with development teams to design, implement, and maintain scalable systems while proactively identifying and resolving issues.

Apr 1, 2026
Apply
Cloudbeds logo
Full-time|Remote|Latin America

What Makes Us Unique At Cloudbeds, we’re not just developing software; we’re revolutionizing the hospitality industry. Our advanced platform empowers properties in over 150 countries, handling billions in bookings each year. From independent hotels to large chains, we assist hoteliers in enhancing operations and elevating their commercial strategies through a unified platform that seamlessly integrates with numerous partners. Moreover, our team operates entirely remotely. Imagine collaborating with innovative minds globally to create AI-driven solutions that address the most pressing challenges faced by hoteliers. Since our inception in 2012, we have distinguished ourselves as the World's Best Hotel PMS Solutions Provider and secured a spot on Deloitte's Technology Fast 500 for 2024, but we are just getting started. How You’ll Make an Impact: As a Senior Site Reliability Engineer, you will be the steward of our platform's reliability and performance, ensuring that millions of hospitality transactions occur smoothly worldwide. You will design and implement scalable AWS cloud solutions that enable ambitious hotels to operate 24/7, all while nurturing a culture of automation, resilience, and continuous improvement across our engineering teams.Our SRE Team:We are a collaborative team that values open discussions and shared responsibility for our infrastructure. You will have abundant opportunities to influence architectural decisions while working with cutting-edge cloud technologies at scale. We believe that the most effective solutions stem from engineers who are empowered to innovate, experiment, and challenge conventional practices.

Mar 12, 2026
Apply
Comtech LLC logo
Contract|On-site|Seattle

Position: Senior Site Reliability Engineer Location: Seattle, WADuration: 12 monthsInterview: In-person for local candidates or via Phone + SkypeAs a Senior Site Reliability Engineer, you will play a pivotal role in the ongoing maintenance and administration of enterprise-level internet systems. Your primary responsibility will be to diagnose and resolve operational issues, ensuring the seamless functioning of our infrastructure. You will also be tasked with developing tools and scripts to enhance these processes.Collaboration with various teams will be essential to document our enterprise infrastructure and monitoring systems effectively. Additionally, you'll oversee the planning and execution of projects ranging from small to large scale within our Technology teams, reporting directly to your manager. This role demands a high level of technical expertise in both traditional enterprise systems and cutting-edge cloud-native applications.If you share our belief that a simple cup of coffee can transform lives and enhance experiences, we invite you to join us in delivering exceptional services to customers worldwide.

Sep 1, 2017
Apply
onebrief logo
Full-time|On-site|Colorado Springs, CO

Join our dynamic team at onebrief as a Senior Site Reliability Engineer in Colorado Springs, where you will play a critical role in enhancing our systems' reliability, performance, and scalability. You'll collaborate with cross-functional teams to implement best practices and ensure the integrity of our services.

Apr 8, 2026
Apply
Arcadia logo
Full-time|Remote|Remote (USA)

As a Principal Site Reliability Engineer at Arcadia, you will play a pivotal role in ensuring the reliability, scalability, and performance of our systems. You will lead initiatives to design and implement robust solutions while collaborating with cross-functional teams to drive operational excellence.

Mar 23, 2026
Apply
Customer.io logo
Full-time|$140K/yr - $180K/yr|Remote|Americas Remote

Join Our Team at Customer.io At Customer.io, we empower over 8,000 companies—ranging from innovative startups to established global brands—to send billions of tailored emails, push notifications, in-app messages, and SMS daily. Our platform drives automated communication that resonates with users. Utilizing real-time behavioral data, we enable teams to craft smarter, more relevant messages. Our tech stack includes Go, React, Ember, and cutting-edge AI, allowing us to deliver quickly and scale confidently. We are seeking a Senior Site Reliability Engineer to enhance our infrastructure, minimize operational challenges, and boost reliability as we continue to grow. If you possess experience with high-scale systems and have a passion for optimizing platforms for both developers and customers, we want to connect with you!

Mar 6, 2026
Apply
HavocAI logo
Full-time|Remote|Remote

Join Our Team:At HavocAI, we are at the forefront of collaborative autonomy, leading the way in the development of autonomous surface vessels for a variety of defense and commercial maritime operations. Our mission is to rapidly expand and innovate solutions that address complex human challenges, while prioritizing life-saving technologies. We are in search of passionate individuals committed to pushing boundaries and making a meaningful impact.Role OverviewWe are looking for a Senior Site Reliability Engineer (SRE) with a minimum of 7 years of experience in designing, operating, and scaling robust distributed systems. In this pivotal role, you will serve as a technical leader in our Cloud Platform team, ensuring the reliability, performance, and resilience of critical services that support autonomy, simulation, and data-heavy workloads.You will collaborate with various teams, including Cloud Platform, DevOps, Data Engineering, and Autonomy, to define reliability standards, enhance operational maturity, and create systems that effectively scale under real-world conditions. The ideal candidate will possess deep technical expertise, demonstrate composure under pressure, and be experienced in managing end-to-end reliability outcomes.

Mar 9, 2026
Apply
Censys logo
Full-time|$120K/yr - $190K/yr|Remote|Remote (US)

Company BackgroundCensys is dedicated to building the most comprehensive and reliable map of the Internet. Our mission is to empower users with real-time Internet intelligence and actionable threat insights, catering to global governments, over 50% of the Fortune 500, and leading threat intelligence providers worldwide.LocationThis is a fully remote position within the United States.Role SummaryAs a Senior Site Reliability Engineer (SRE) on the Infrastructure and Operations team, you will play a crucial role in designing, building, and deploying tools that enhance the efficiency of our development teams and production applications. We are seeking skilled engineers who are passionate about cloud-native technologies and committed to improving our microservice architecture's reliability and operational maturity.Focusing on Developer Efficiency and Experience, you will help streamline engineering workflows, support our Software Development Life Cycle (SDLC), and empower developers to confidently build, deploy, and manage their services within the platform.What You'll DoDevelop and maintain tools to support applications running on Kubernetes and Google Cloud Platform.Collaborate with development teams to facilitate the building, shipping, and deploying of services and applications, ensuring resilience and reliability.Monitor and ensure the smooth operation of our production environments, assisting developers in debugging complex issues and capturing the four golden signals of performance.Contribute to the creation of a self-service platform that accelerates developer velocity, including service catalogs, repository tooling, and comprehensive documentation.Participate in a shared on-call rotation, embracing end-to-end service ownership alongside development teams.

Apr 7, 2026
Apply
Calendly logo
Full-time|Remote|Remote - US

Role overview Calendly is hiring a Senior Site Reliability Engineer to help strengthen the reliability and performance of its platform. This is a fully remote role open to candidates based in the US. What you will do Work closely with teams across engineering, product, and operations to design and maintain scalable systems. Troubleshoot complex technical issues as they arise. Lead root cause analysis to resolve incidents and prevent recurrence. Drive projects that improve system reliability, availability, and performance for a growing user base. Who we’re looking for Experienced engineers with a background in site reliability or related fields. Strong problem-solving skills and a track record of improving system uptime and performance. Comfort working in a collaborative, remote environment.

Apr 13, 2026

Sign in to browse more jobs

Create account — see all 64,128 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.