1 - 20 of 52,094 Jobs

Search for Site Reliability Engineer at ditto | Remote

52,094 results

Apply
companyditto logo
Full-time|Remote|Remote

Join our dynamic team at ditto as a Site Reliability Engineer, where you'll play a pivotal role in enhancing our platform's performance and reliability. You'll collaborate with cross-functional teams to ensure the seamless operation of our services while implementing best practices in automation, monitoring, and incident response.

Dec 17, 2025
Apply
companyDitto logo
Full-time|On-site|APAC

About Ditto:Ditto is revolutionizing data movement at the edge, empowering developers to create resilient, real-time applications irrespective of varying network conditions. Whether in a stadium, on an airplane, or at a remote military base, Ditto’s peer-to-peer synchronization engine guarantees continuous device connectivity and consistent data integrity, even in the absence of internet access. Backed by over $145 million in funding and trusted by esteemed organizations such as Chick-fil-A, Delta Airlines, and the U.S. military, Ditto facilitates mission-critical operations across sectors including aviation, retail, travel, hospitality, and defense. As a rapidly expanding, globally distributed startup, we are devoted to fostering a diverse and inclusive team that encapsulates a myriad of perspectives essential for addressing the world’s most complex connectivity challenges.About the Position:At this pivotal moment, Ditto is scaling to meet the needs of its enterprise clientele, necessitating skilled Site Reliability Engineers to uphold enterprise-grade reliability within our infrastructure.This role presents a remarkable opportunity to become part of a specialized team dedicated to observability, system reliability, and operational excellence for our innovative edge-to-cloud database technology.As a Site Reliability Engineer, you will be instrumental in ensuring the reliability, performance, and scalability of Ditto's cloud infrastructure. You will collaborate with product engineering teams to enhance system resilience, spearhead incident management processes, and develop observability solutions tailored for our distinct distributed architecture.Key Responsibilities:Develop and maintain observability solutions leveraging platforms such as Datadog, Prometheus, and Grafana.Lead incident management efforts, coordinating response strategies, troubleshooting issues, and determining follow-up actions.Collaborate with product engineering teams to design reliable systems, recover from incidents, and derive insights from failures.Work with teams to establish and uphold SLOs, monitoring frameworks, and alerting mechanisms that ensure reliability at scale.Design and implement automation and support tools to enhance system resilience, maintain operational safety, and minimize operational overhead.Lead the creation and upkeep of runbooks, alert definitions, and incident response protocols.Participate in on-call rotations to deliver 24/7 support for critical production systems.

Apr 9, 2026
Apply
companyditto logo
Full-time|Remote|Remote

About Ditto:At Ditto, we are transforming the way data is transmitted at the edge. Our goal is to empower developers to create robust, real-time applications that function seamlessly under any network conditions. From bustling stadiums to remote military bases, our peer-to-peer synchronization engine guarantees continuous connectivity and consistent data transfer, even in the absence of the internet. Backed by over $145 million in funding and trusted by esteemed organizations including Chick-fil-A, Delta Airlines, and the U.S. military, Ditto is at the forefront of delivering mission-critical solutions across various sectors such as aviation, retail, travel, hospitality, and defense. As a rapidly growing global startup, we are dedicated to cultivating a diverse and inclusive team that embodies the myriad perspectives necessary to tackle the toughest connectivity challenges.Are you ready to influence the evolution of mesh networking? Join Ditto's pioneering team in next-generation network technology as we enhance our core networking stack and develop high-performance solutions focused on routing protocols, end-to-end connectivity, transport mechanisms, and edge platforms for mesh systems. Enjoy the flexibility of remote work, set your own hours, and engage with complex, meaningful challenges. At Ditto, we are proud to be an equal opportunity employer, committed to fostering a workplace that embraces diverse backgrounds, perspectives, and talents.

Jan 14, 2026
Apply
companyditto logo
Full-time|Remote|Remote

About Ditto:At Ditto, we are revolutionizing data mobility at the edge. Our mission is to enable developers to create robust, real-time applications that function seamlessly, regardless of network constraints. Be it in a stadium, on an airplane, or at a remote military installation, Ditto's peer-to-peer synchronization engine guarantees that devices remain interconnected and data remains consistent, even in the absence of internet connectivity. With over $145 million in funding, we are trusted by industry leaders including Chick-fil-A, Delta Airlines, and the U.S. military. Ditto facilitates mission-critical experiences across various sectors including aviation, retail, travel, hospitality, and defense. As a rapidly growing, globally distributed startup, we are dedicated to fostering a diverse and inclusive team that represents a wide array of perspectives necessary to tackle the world's most challenging connectivity issues.About UsWe empower edge devices to reach their full potential by simplifying the complexities involved in application development. Our global team values trust, communication, and continuous improvement. We celebrate diversity and strive to cultivate a team that reflects a broad spectrum of backgrounds, skills, and viewpoints.Ditto is committed to expanding internet capabilities beyond conventional boundaries. Our innovative software enables devices to synchronize data in real-time utilizing state-of-the-art peer-to-peer technology adaptable across mobile, web, IoT, and server platforms.Role DescriptionWe are in search of a seasoned Senior Software Engineer specializing in Bluetooth to lead and enhance the Bluetooth transport layer integral to Ditto's distributed data platform. Your role will involve designing, developing, and optimizing software that facilitates reliable, low-latency peer-to-peer communication between devices, even offline. Your contributions will have a direct impact on millions of data sync operations occurring within our customers' distributed systems.In this position, you will address the challenges associated with Bluetooth connectivity—such as unstable connections, varying device behaviors, and interference—while developing resilient and elegant solutions that create an intuitive peer-to-peer syncing experience for users.Your Responsibilities Include:Developing and refining core connectivity featuresDesign and implement robust Bluetooth Low Energy (BLE) solutions using C++/C/Kotlin for iOS and Android platforms. You will navigate complex connection states, manage central and peripheral roles, and ensure our Bluetooth stack achieves the low-latency and high-throughput performance expected of our distributed database.Enhancing our protocol capabilitiesIntroduce new Bluetooth profiles and features that expand...

Oct 7, 2025
Apply
companyditto logo
Full-time|Remote|Remote

About Ditto:Ditto is revolutionizing data connectivity at the edge, enabling developers to create robust, real-time applications that function seamlessly even in challenging network environments. Whether you're at a crowded stadium, on an airplane, or stationed at a remote military base, Ditto's peer-to-peer sync engine keeps devices interconnected and data consistent, even without internet access. With over $145 million in funding and trusted by industry leaders such as Chick-fil-A, Delta Airlines, and the U.S. military, Ditto is essential for critical operations in sectors including aviation, retail, travel, hospitality, and defense. As a rapidly expanding, globally distributed startup, we prioritize building a diverse and inclusive team to tackle the world's toughest connectivity challenges.In the role of Senior SDK Engineer within our SDK team, you will be responsible for enhancing the developer experience for our widely-adopted React Native SDK. This vital tool allows developers to integrate Ditto's real-time, offline-first synchronization capabilities into their mobile applications with ease. You will design and implement APIs that resonate with React Native developers while adeptly managing the intricate interface between JavaScript and native code.Your work on the React Native SDK will inherently involve cross-platform challenges, as issues often stem from native Android or iOS code rather than JavaScript. Therefore, we seek a candidate possessing strong Android development skills in addition to React Native proficiency. Comfort with Kotlin, the ability to interpret JNI stack traces, and debugging across the spectrum from TypeScript hooks to our Rust core will be crucial.Our React Native SDK serves as a bridge to native Android and iOS implementations, which connect to Ditto's Rust core via FFI layers. Your responsibilities will span the entire technology stack: designing user-friendly JavaScript APIs, developing native modules, and collaborating with platform SDK owners to ensure uniform functionality. When issues arise, you will trace them through the React Native bridge, delve into native code, and potentially address them at the Rust layer.From day one, you will take ownership of the entire development lifecycle: crafting intuitive public APIs, building reliable native bridges, writing extensive automated tests, and partnering with our Release team to deliver dependable updates. You'll engage directly with customers and support teams to troubleshoot integration challenges, enhance performance on limited-resource devices, and transform field feedback into product advancements.As part of a small, globally distributed team that values trust, transparent communication, and continuous growth, you will thrive in our async-first culture where your written design documents and code reviews carry significant weight alongside synchronous meetings. We look forward to your contribution in shaping the future of connectivity.

Jan 29, 2026
Apply
companyditto logo
Full-time|Remote|Remote

As a Customer Success Manager at ditto, you will play a pivotal role in ensuring our clients achieve their desired outcomes while using our products. Your primary focus will be to foster strong relationships, provide exceptional support, and drive customer engagement. You will work closely with our clients to understand their needs and help them navigate our services effectively.Your responsibilities will include onboarding new clients, conducting regular check-ins, and providing insights on product usage to help customers maximize their benefits. We are looking for a proactive individual who is passionate about customer satisfaction and thrives in a fast-paced, remote environment.

Mar 23, 2026
Apply
companyWikimedia Foundation logo
Full-time|Remote|Remote

Summary The Wikimedia Foundation is on the lookout for a talented Senior Site Reliability Engineer to enhance and maintain the infrastructure that powers the world’s most beloved encyclopedia, Wikipedia, serving millions globally. Our Site Reliability Engineering (SRE) team is dedicated to ensuring that our globally recognized top-10 website operates smoothly while innovating to further our mission: to empower everyone to share in the sum of all knowledge. As a member of the SRE team, you will join a diverse and globally distributed group of engineers passionate about exploring, experimenting, and adopting new technologies. We believe in transparency, sharing our documentation, code, and configuration as open source. Our production systems are powered entirely by open-source software, and we encourage you to review our work without any login requirements. If you are intrigued by the challenge of improving the reliability and delivery of one of the Internet’s top websites and thrive in a remote-first environment, we invite you to consider joining us.

Mar 21, 2026
Apply
companyShippo logo
Full-time|$100K/yr - $156K/yr|Remote|Remote (United States)

Responsibilities in Shipping & HandlingArchitect, scale, and secure infrastructure to meet evolving business demands, employing fault-tolerant designs, performance testing, profiling, and strategic capacity planning.Develop, implement, and sustain automation, monitoring, and alerting systems, alongside disaster recovery protocols.Promote scalability and maintainability through microservices architecture, decoupling concerns, effective data modeling, job queuing, and application layering.Enhance and oversee our CI/CD pipeline to ensure seamless and secure production deployments via automated testing and verification.Evaluate and confirm system performance and accuracy concerning response times and throughput.Engage in peer reviews and testing, contributing to automated testing suites and participating in design reviews for new features, products, and systems.Partake in an on-call rotation for system support.

Mar 15, 2026
Apply
companyRunlayer logo
Full-time|Remote|Remote

AI is revolutionizing the operational landscape for businesses, yet many enterprises find themselves hindered in their efforts to effectively implement AI tools, agents, and workflows. At Runlayer, we are dedicated to dismantling these barriers.Our innovative team has developed AI Actions for OpenAI, delivered Zapier Agents to millions, and launched the first remote MCP server in partnership with Anthropic. With the co-creator of MCP on our cap table, we are establishing the essential platform that enterprises need to leverage AI securely and effectively.Runlayer serves as a unified platform for MCPs, Skills, and Agents. We provide purpose-built security, fine-grained governance, and complete observability, enabling organizations to advance their AI initiatives with confidence. With $11M raised from Khosla Ventures and Felicis, we proudly support clients such as Gusto, Instacart, and Opendoor.As a compact team of 25, primarily engineers, we thrive on rapid deployment and innovation. If you aspire to be at the forefront of AI implementation, now is the time to join us.In the role of Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and scalability of Runlayer's infrastructure as we expand to meet the needs of our enterprise customers across both cloud and on-prem environments.Why You'll Thrive HereImpact: Construct the foundational infrastructure for the enterprise MCP platform, directly facilitating large-scale AI adoption.Excellence: Collaborate closely with founders and a small, experienced engineering team, delivering swiftly in a high-growth setting.Ownership: Take full responsibility for reliability from database performance to incident response and CI/CD pipelines.What You'll DoOversee the reliability and performance of our cloud infrastructure across AWS (ECS, Aurora, CloudWatch) and GCP.Manage and optimize Kubernetes clusters and container orchestration.Lead database reliability engineering efforts, including performance tuning and scaling.Develop and maintain CI/CD pipelines for efficient and secure deployments.Conduct incident response and participate in on-call rotations.Collaborate with product engineers to design scalable and resilient systems.What We're Looking ForProven experience with AWS services including ECS, Aurora, and CloudWatch.Expertise in Kubernetes management and container orchestration.Strong background in database reliability engineering.Solid understanding of CI/CD methodologies and tools.Effective incident response skills and a proactive approach to system reliability.Ability to work collaboratively in a fast-paced environment with a focus on innovation.

Apr 3, 2026
Apply
companyakuity logo
Full-time|Remote|Remote - US Timezones

Join our dynamic team at akuity as a Senior Site Reliability Engineer, where you'll play a pivotal role in enhancing the reliability and performance of our systems. In this exciting remote position, you will collaborate with cross-functional teams to implement innovative solutions that ensure seamless service delivery.Your expertise will be vital in monitoring system health, optimizing performance, and troubleshooting issues to provide exceptional user experiences. If you are passionate about building scalable and robust infrastructures, we want to hear from you!

Mar 18, 2026
Apply
companyUnifonic logo
Full-time|Remote|Remote job

Unifonic operates as a remote-first company in the CPaaS sector, providing communication solutions to over 5,000 businesses. With a team of 500, Unifonic supports clients in building stronger customer connections. The Engineering team at Unifonic is responsible for designing, building, and maintaining the systems that power the company’s products. Team members collaborate closely with other departments to ensure technology aligns with customer needs. Creativity and new ideas are encouraged across the group. Role overview The Senior Site Reliability Engineer joins the Production Operations (Live) team. This role centers on ensuring the reliability, scalability, and resilience of Unifonic’s cloud infrastructure and distributed messaging platforms. The SRE team works to keep systems running smoothly at all times and continually seeks ways to improve performance and stability. What you will do Maintain the reliability, uptime, and scalability of key production services around the clock. Participate in the on-call rotation, respond to incidents, troubleshoot live production issues, and lead post-incident reviews. Create and update operational playbooks and escalation paths to help reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR). Monitor service level objectives (SLOs), conduct chaos testing, plan for capacity, and address reliability risks as they arise.

Apr 22, 2026
Apply
companyArcadia logo
Full-time|Remote|Remote (USA)

As a Principal Site Reliability Engineer at Arcadia, you will play a pivotal role in ensuring the reliability, scalability, and performance of our systems. You will lead initiatives to design and implement robust solutions while collaborating with cross-functional teams to drive operational excellence.

Mar 23, 2026
Apply
companyHashgraph logo
Full-time|Remote|Remote within US time zones

About Hashgraph:Hashgraph is an innovative and rapidly growing software company dedicated to supporting, developing, and maintaining Hedera, an open-source proof-of-stake platform. Hedera is EVM-compatible and designed to cater to the demands of enterprise and web3 applications, focusing on speed, security, stability, and sustainability. The public network of Hedera is governed by leading organizations across 11 sectors and 14 regions, ensuring robust oversight of the decentralized platform's development and direction.About the RoleWe are seeking a Senior Site Reliability Engineer to join the HashSphere engineering team. In this pivotal role, you will assist in designing, building, and integrating essential product features for enterprises utilizing Hiero, our private distributed ledger technology. This greenfield project is at the forefront of decentralized systems and cloud technologies, with a strong emphasis on security, privacy, and scalability.Your expertise in distributed systems engineering, coupled with your software development skills and knowledge of industry-standard SRE and DevOps practices, will be crucial in delivering core platform services. You will contribute to a highly scalable, mission-critical infrastructure product utilized by some of the largest organizations in finance, supply chain, and healthcare sectors.If you possess experience in designing scalable, reliable, and secure distributed system architectures within AWS, GCP, or Azure, and are eager to collaborate with a passionate team to build pioneering technology, this could be the perfect opportunity for you.

Jan 23, 2026
Apply
companyJuul Labs logo
Full-time|$158K/yr - $227K/yr|Remote|Remote - United States; United States of America

ABOUT JUUL LABS: At Juul Labs, we are dedicated to revolutionizing the experience of adult smokers by transitioning them away from traditional combustible cigarettes. Our mission is to eliminate their use and prevent underage access to our products. We tackle this global health challenge with a focus on quality, innovation, and research. Supported by prominent technology investors, we aim for excellence not only in our products but also in our talent acquisition. We embrace diversity and are united by our mission. We are seeking the world's best engineers, scientists, designers, product managers, operations experts, and customer service professionals. If you are ready to advance your career with us, we encourage you to explore this opportunity. ROLE OVERVIEW: As a Senior Site Reliability Engineer (SRE), you will take ownership of the operational stability and performance of Juul's hybrid cloud infrastructure (Nutanix, AWS/GCP). Your responsibilities will include leading automation initiatives, ensuring reliability in architecture, and serving as the go-to expert for critical incident escalation to guarantee a scalable and efficient platform. Nutanix Platform Management Responsibilities: Design, deploy, and maintain enterprise-scale Nutanix AHV clusters and manage Prism Central for multi-cluster operations. Exhibit expert-level proficiency with Nutanix CLI (nCLI and acli) for advanced operations and automation. Create automation scripts using Nutanix REST APIs, Python SDK, PowerShell, and Terraform. Manage VM templates, golden images, and standardized deployment catalogs. Design disaster recovery solutions utilizing Leap, Protection Domains, and metro clustering. Implement network micro-segmentation with Nutanix Flow, including RBAC and encryption tactics. Lead Level 3 troubleshooting through advanced diagnostics and log analysis. Configure high availability and optimize performance for critical workloads. Oversee AHV networking with OVS bridges, VLANs, and implement resource reservations. Architect and maintain hybrid cloud solutions across Nutanix HCI, AWS, and GCP environments. Cloud Platform Engineering Responsibilities: Further responsibilities in cloud platform engineering will be communicated during the interview process to ensure alignment with your expertise.

Apr 30, 2026
Apply
companyArista Networks logo
Full-time|Remote|Remote

Join Arista Networks as a Site Reliability Engineer and play a critical role in ensuring the reliability and performance of our cutting-edge cloud networking solutions. In this fully remote position, you will collaborate with cross-functional teams to enhance our systems and foster a culture of continuous improvement.

Mar 19, 2026
Apply
companyWeedmaps logo
Full-time|$133.1K/yr - $148K/yr|Remote|New York City, NY

Site Reliability Engineer Overview: Join Weedmaps as a Site Reliability Engineer and collaborate across departments, including application, infrastructure, and quality teams, to elevate the performance, reliability, resilience, and scalability of our web services at Weedmaps.com. As a cloud-native organization, we run 100% of our services in Docker on Kubernetes within AWS's public cloud. Our operations utilize observability, monitoring, CI/CD automation, and custom tooling, enabling us to deploy multiple production releases daily. Your daily responsibilities will focus on applying your engineering expertise to enhance system monitoring, minimize developer toil, configure CI workflows, and optimize our deployment pipelines. You will serve as a knowledge reference for development teams, ensuring they utilize consistent tools for metrics, logging, building, and deployment. Collaborating closely with both development and infrastructure teams, you will identify critical service-specific metrics that require monitoring, and you will help application development teams create libraries for seamless service instrumentation. The impact you'll make: Collaborate with stakeholders to establish and promote best practices for monitoring and CI/CD pipelines. Troubleshoot issues related to deployment within our CI pipeline. Actively promote the DevOps culture at Weedmaps. Identify opportunities for automation and advocate for the codification of processes. Promote best practices regarding collaboration, reliability, security, and performance across all partner teams. Take ownership of application configuration and scaling for specified services, ensuring adherence to organizational practices. Develop and optimize synthetic monitoring flows. What you've accomplished: A minimum of 2 years of development experience in startup or mid-sized environments. Proficiency in programming languages such as Python, Go, Node, Ruby, or Elixir. Knowledge of containerization technologies, particularly Docker (Kubernetes experience is a plus). Strong communication skills, a positive demeanor, and the ability to provide and receive constructive feedback. Professional experience with cloud-native observability standards including OpenMetrics, OpenTracing, and OpenCensus. Expertise in using and configuring modern CI/CD workflows. Deep understanding of SLIs, SLOs, and SLAs at both service and business levels. Familiarity with golden signals and their significance in monitoring.

Apr 3, 2026
Apply
companyHomeVision logo
Full-time|Remote|United States

At HomeVision, we are pioneering innovations in real estate valuation to foster a more efficient, transparent, and equitable housing market. By harnessing advanced technologies such as Natural Language Processing (NLP), computer vision, and large language models (LLMs), we are transforming the appraisal process, enabling appraisers to enhance their productivity. Backed by Initialized Capital, we are experiencing rapid growth and are on the lookout for a dynamic Site Reliability Engineer (SRE) to aid in our scaling efforts.Key ResponsibilitiesDesign and manage the infrastructure supporting our SaaS offerings, predominantly utilizing AWS.Develop tools and oversee platform components to assist our development teams.Engage in software development initiatives, focusing on areas such as authentication, reliability, and observability.Support daily operations, including setting up testing environments and overseeing deployments.Address IT-related tasks like user onboarding and account management.Maintain a flexible work schedule while ensuring availability until 6 PM Pacific Time for internal support and monitoring.QualificationsMinimum of 2 years of experience in Site Reliability Engineering or cloud operations, with AWS experience preferred.At least 1 year of software development experience.A data-driven mindset.A readiness to work across cloud infrastructure and IT as required.Meticulous attention to detail and a commitment to creating high-quality systems.Eligibility CriteriaCandidates must reside in the US or Puerto Rico.Currently, we are unable to sponsor work visas; thus, candidates must be authorized to work in the US without sponsorship.Preferred QualificationsFamiliarity with Terraform or other Infrastructure as Code (IaC) tools.Interest and experience in database administration.Candidates located in Seattle or San Francisco will receive additional consideration.Our OfferingsCompetitive salary, equity, and comprehensive health benefits.Significant ownership and autonomy in your role.Support for your professional development and growth.A fully remote and flexible work environment.We request that no recruiters or automated submissions apply.

Aug 29, 2025
Apply
company
Full-time|Remote|USA

Site Reliability Engineer (SRE)Global (UTC-3 preferred)At Axiom, our mission is to empower developers by providing swift and insightful access to their data. As a remote-first, globally distributed organization, we are dedicated to creating a cloud-native, serverless data analytics platform. Axiom revolutionizes the way developers and organizations manage their data, allowing for unlimited data transmission with economical storage solutions and rapid querying capabilities.As a Site Reliability Engineer at Axiom, you will play a crucial role in ensuring exceptional reliability and performance for our customers. Working alongside backend engineers and product teams, you will focus on designing and maintaining scalable and dependable systems. Our SRE philosophy emphasizes automation, measurement, and continuous enhancement of system reliability and efficiency.Your core responsibilities include:Design and maintain a robust, secure, and scalable infrastructure for Axiom Cloud.Collaborate with engineering teams to establish and refine service level objectives.Assist in disaster recovery planning, capacity engineering, performance analysis, and system optimization.Promote best practices for code deployments, contributing to the education of the wider development team.Implement tools and solutions that enhance system reliability and minimize manual efforts.Investigate and resolve service incidents, contributing to postmortems and root cause analysis.Cultivate a culture of monitoring, alerting, and observability within the organization.

Aug 28, 2025
Apply
companyHomeVision logo
Full-time|Remote|United States

At HomeVision, we're redefining the landscape of real estate valuation, striving to foster a more efficient, transparent, and equitable housing market. Utilizing cutting-edge technologies such as Natural Language Processing (NLP), computer vision, and large language models (LLMs), we enhance the appraisal process, empowering appraisers to work with greater efficiency. Supported by Initialized Capital, we are rapidly expanding and in search of a Site Reliability Engineer (SRE) to join our team and assist in scaling our innovative solutions.This role presents an exciting opportunity to embark on an engineering career, providing exposure to a diverse range of skills. You'll collaborate with a dedicated team responsible for overseeing all facets of our technology infrastructure, including AWS, IT, and AI. Your primary focus will be on maintaining the security and scalability of our platform, while also engaging in larger projects that support product enhancements and AI integration.

Apr 8, 2026
Apply
companyCognitiv logo
Full-time|$160K/yr - $210K/yr|Hybrid|Bellevue, WA

Are you prepared to transform the advertising landscape? At Cognitiv, we are not merely another AdTech firm—we are pioneers reshaping media buying with our advanced Deep Learning Advertising Platform. Since our inception in 2015, we have been leveraging state-of-the-art deep learning technologies and data science to redefine how brands engage with their audiences. Our mission is clear: to infuse intelligence into advertising, delivering unmatched precision, relevance, and impact at scale. Our innovative platform provides advertisers with unparalleled flexibility—whether activating Dynamic Deals through their preferred DSP, utilizing our managed service DSP, or tapping into our groundbreaking ContextGPT product. Joining Cognitiv means being at the forefront of AI-driven advertising solutions, leading change, and achieving remarkable growth in a fast-paced industry. We are currently expanding!The RoleWe are seeking a Senior Site Reliability Engineer to enhance our global network of datacenters and elevate service management across Cognitiv. Your primary focus will be on rapidly expanding our hybrid cloud infrastructure. As a growing organization, we strive to adhere to industry best practices. This position requires an experienced engineer who is eager to learn our environment quickly and help shape our long-term service management strategy.This role will be based in our Bellevue, WA office with a hybrid work schedule of 3 days in-office (Monday/Tuesday/Wednesday) and 2 days remote (Thursday/Friday).ResponsibilitiesDesign, implement, and maintain infrastructure across a widening footprint of co-located deployments.Assess existing physical and network architectures to ensure long-term scalability and growth.Collaborate with engineering and product teams to accurately scope projects based on core business requirements.Lead company-wide initiatives to enhance service management surrounding deployments, monitoring, and disaster recovery.Oversee and maintain shared infrastructure within our AWS environment.RequirementsUnderstanding of contemporary datacenter practices with experience in configuring multi-datacenter deployments.Extensive knowledge of AWS infrastructure, networking, and management practices.Demonstrated experience with infrastructure as code and related tools.

Mar 19, 2026

Sign in to browse more jobs

Create account — see all 52,094 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.