1 - 20 of 51,711 Jobs

Search for Senior Software Engineer - Site Reliability

51,711 results

Apply
Parabola logo
Full-time|$180K/yr - $200K/yr|On-site|New York, New York

About Us:At Parabola, we empower teams to transform and streamline complex data workflows with ease. Our innovative workflow builder allows users to automate tasks that were previously manual, including data from PDFs, emails, and spreadsheets. Forward-thinking companies such as Brooklinen, On Running, and Flexport leverage Parabola to enhance their producti…

Oct 28, 2025
Apply
Upstart Network, Inc. logo
Full-time|Remote|United States | Remote

Join Upstart as a Senior Software Engineer specializing in Site Reliability, where you will play a key role in enhancing and maintaining our cloud-based systems. Your expertise will ensure the reliability and scalability of our applications, contributing to our mission of providing accessible credit to consumers. We are looking for an innovative engineer who thrives in a fast-paced environment and is excited about solving complex challenges.

Mar 19, 2026
Apply
Veeva Systems Inc. logo
Full-time|Remote|California - San Luis Obispo

At Veeva Systems, we are at the forefront of transforming the life sciences sector, dedicated to accelerating the delivery of therapies to patients. As a trailblazer in industry cloud solutions, we have achieved remarkable growth, surpassing $2 billion in revenue last fiscal year, and we see even greater potential ahead.Our core values define us: Doing the Right Thing, Ensuring Customer Success, Promoting Employee Growth, and Operating with Speed. In 2021, we made history by becoming a public benefit corporation (PBC), committed to balancing the needs of our customers, employees, society, and investors.As a Work Anywhere company, we offer flexibility in where you work, allowing you to excel in your preferred environment.We invite you to join us in transforming the life sciences industry and positively impacting our customers, employees, and communities.The RoleBecome a pivotal member of our Vault Platform team as a Senior Site Reliability Engineer. In this role, you will ensure the scalability and reliability of our enterprise applications, tackling intricate challenges on a global scale. Your expertise in Java and contemporary open-source technologies will directly influence the performance of our production systems.We seek candidates with extensive experience in Java applications and familiarity with the latest open-source technologies, ideally from backgrounds in enterprise software development or fast-paced tech environments. As a Senior SRE, you should possess a natural curiosity and adept problem-solving skills, alongside a comprehensive understanding of how various systems integrate to function efficiently for hundreds of customers across North America, Europe, and Asia.

Jun 27, 2025
Apply
Shippo logo
Full-time|$100K/yr - $156K/yr|Remote|Remote (United States)

Responsibilities in Shipping & HandlingArchitect, scale, and secure infrastructure to meet evolving business demands, employing fault-tolerant designs, performance testing, profiling, and strategic capacity planning.Develop, implement, and sustain automation, monitoring, and alerting systems, alongside disaster recovery protocols.Promote scalability and maintainability through microservices architecture, decoupling concerns, effective data modeling, job queuing, and application layering.Enhance and oversee our CI/CD pipeline to ensure seamless and secure production deployments via automated testing and verification.Evaluate and confirm system performance and accuracy concerning response times and throughput.Engage in peer reviews and testing, contributing to automated testing suites and participating in design reviews for new features, products, and systems.Partake in an on-call rotation for system support.

Mar 15, 2026
Apply
Cognitiv logo
Full-time|$160K/yr - $210K/yr|Hybrid|Bellevue, WA

Are you prepared to transform the advertising landscape? At Cognitiv, we are not merely another AdTech firm—we are pioneers reshaping media buying with our advanced Deep Learning Advertising Platform. Since our inception in 2015, we have been leveraging state-of-the-art deep learning technologies and data science to redefine how brands engage with their audiences. Our mission is clear: to infuse intelligence into advertising, delivering unmatched precision, relevance, and impact at scale. Our innovative platform provides advertisers with unparalleled flexibility—whether activating Dynamic Deals through their preferred DSP, utilizing our managed service DSP, or tapping into our groundbreaking ContextGPT product. Joining Cognitiv means being at the forefront of AI-driven advertising solutions, leading change, and achieving remarkable growth in a fast-paced industry. We are currently expanding!The RoleWe are seeking a Senior Site Reliability Engineer to enhance our global network of datacenters and elevate service management across Cognitiv. Your primary focus will be on rapidly expanding our hybrid cloud infrastructure. As a growing organization, we strive to adhere to industry best practices. This position requires an experienced engineer who is eager to learn our environment quickly and help shape our long-term service management strategy.This role will be based in our Bellevue, WA office with a hybrid work schedule of 3 days in-office (Monday/Tuesday/Wednesday) and 2 days remote (Thursday/Friday).ResponsibilitiesDesign, implement, and maintain infrastructure across a widening footprint of co-located deployments.Assess existing physical and network architectures to ensure long-term scalability and growth.Collaborate with engineering and product teams to accurately scope projects based on core business requirements.Lead company-wide initiatives to enhance service management surrounding deployments, monitoring, and disaster recovery.Oversee and maintain shared infrastructure within our AWS environment.RequirementsUnderstanding of contemporary datacenter practices with experience in configuring multi-datacenter deployments.Extensive knowledge of AWS infrastructure, networking, and management practices.Demonstrated experience with infrastructure as code and related tools.

Mar 19, 2026
Apply
Unifonic logo
Full-time|Remote|Remote job

Unifonic operates as a remote-first company in the CPaaS sector, providing communication solutions to over 5,000 businesses. With a team of 500, Unifonic supports clients in building stronger customer connections. The Engineering team at Unifonic is responsible for designing, building, and maintaining the systems that power the company’s products. Team members collaborate closely with other departments to ensure technology aligns with customer needs. Creativity and new ideas are encouraged across the group. Role overview The Senior Site Reliability Engineer joins the Production Operations (Live) team. This role centers on ensuring the reliability, scalability, and resilience of Unifonic’s cloud infrastructure and distributed messaging platforms. The SRE team works to keep systems running smoothly at all times and continually seeks ways to improve performance and stability. What you will do Maintain the reliability, uptime, and scalability of key production services around the clock. Participate in the on-call rotation, respond to incidents, troubleshoot live production issues, and lead post-incident reviews. Create and update operational playbooks and escalation paths to help reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR). Monitor service level objectives (SLOs), conduct chaos testing, plan for capacity, and address reliability risks as they arise.

Apr 22, 2026
Apply
Veeva Systems Inc. logo
Full-time|Remote|California - Los Angeles

Veeva Systems is a purpose-driven company at the forefront of industry cloud solutions, dedicated to accelerating the delivery of therapies to patients by empowering life sciences organizations. As one of the fastest-growing SaaS companies in history, we exceeded $2 billion in revenue in our previous fiscal year, with significant growth opportunities on the horizon.Our core values at Veeva include: Do the Right Thing, Customer Success, Employee Success, and Speed. We are distinct in the market as we became a public benefit corporation (PBC) in 2021, which commits us to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere company, we promote flexibility, allowing you to work from home or the office, enabling you to thrive in an environment that suits you best.Become a part of our mission to transform the life sciences industry and make a meaningful impact on our customers, employees, and the communities we serve.

Jun 27, 2025
Apply
Juul Labs logo
Full-time|$158K/yr - $227K/yr|Remote|Remote - United States; United States of America

ABOUT JUUL LABS: At Juul Labs, we are dedicated to revolutionizing the experience of adult smokers by transitioning them away from traditional combustible cigarettes. Our mission is to eliminate their use and prevent underage access to our products. We tackle this global health challenge with a focus on quality, innovation, and research. Supported by prominent technology investors, we aim for excellence not only in our products but also in our talent acquisition. We embrace diversity and are united by our mission. We are seeking the world's best engineers, scientists, designers, product managers, operations experts, and customer service professionals. If you are ready to advance your career with us, we encourage you to explore this opportunity. ROLE OVERVIEW: As a Senior Site Reliability Engineer (SRE), you will take ownership of the operational stability and performance of Juul's hybrid cloud infrastructure (Nutanix, AWS/GCP). Your responsibilities will include leading automation initiatives, ensuring reliability in architecture, and serving as the go-to expert for critical incident escalation to guarantee a scalable and efficient platform. Nutanix Platform Management Responsibilities: Design, deploy, and maintain enterprise-scale Nutanix AHV clusters and manage Prism Central for multi-cluster operations. Exhibit expert-level proficiency with Nutanix CLI (nCLI and acli) for advanced operations and automation. Create automation scripts using Nutanix REST APIs, Python SDK, PowerShell, and Terraform. Manage VM templates, golden images, and standardized deployment catalogs. Design disaster recovery solutions utilizing Leap, Protection Domains, and metro clustering. Implement network micro-segmentation with Nutanix Flow, including RBAC and encryption tactics. Lead Level 3 troubleshooting through advanced diagnostics and log analysis. Configure high availability and optimize performance for critical workloads. Oversee AHV networking with OVS bridges, VLANs, and implement resource reservations. Architect and maintain hybrid cloud solutions across Nutanix HCI, AWS, and GCP environments. Cloud Platform Engineering Responsibilities: Further responsibilities in cloud platform engineering will be communicated during the interview process to ensure alignment with your expertise.

Apr 30, 2026
Apply
Veeva Systems Inc. logo
Full-time|Remote|Massachusetts - Boston

Join Veeva Systems, a groundbreaking organization at the forefront of the industry cloud, dedicated to accelerating the delivery of therapies to patients worldwide. As one of the fastest-growing SaaS companies in history, we have achieved over $2 billion in revenue last fiscal year, with abundant growth opportunities on the horizon.At Veeva, we operate based on our core values: Do the Right Thing, Customer Success, Employee Success, and Speed. In 2021, we made history by becoming a public benefit corporation (PBC), committed to balancing the needs of our customers, employees, society, and investors.As a Work Anywhere company, we empower you to choose your ideal work environment, whether from home or in the office, to help you thrive.Be a part of our mission to transform the life sciences industry and positively impact our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be responsible for ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your deep knowledge of Java and modern open-source technologies to make a significant impact on our production systems.Ideal candidates will have extensive experience working with Java applications and the latest open-source technologies, preferably gained in enterprise software development or a rapidly growing tech environment. As a Senior SRE, you will need to be innately curious and possess strong problem-solving skills. Additionally, you will bring a unique engineering perspective, understanding how systems integrate in production to function at a global scale for hundreds of customers across North America, Europe, and Asia.

Oct 7, 2025
Apply
SpaceX logo
Full-time|On-site|Hawthorne, CA

Role Overview SpaceX is looking for a Site Reliability Engineer specializing in Application Software at the Hawthorne, CA location. This role centers on improving the reliability, scalability, and efficiency of internal applications. What You Will Do Work side by side with software development teams to boost application performance and reliability. Apply proven practices to strengthen system reliability. Deploy creative solutions to challenging engineering issues within application software.

Apr 14, 2026
Apply
Wikimedia Foundation logo
Full-time|Remote|Remote

Summary The Wikimedia Foundation is on the lookout for a talented Senior Site Reliability Engineer to enhance and maintain the infrastructure that powers the world’s most beloved encyclopedia, Wikipedia, serving millions globally. Our Site Reliability Engineering (SRE) team is dedicated to ensuring that our globally recognized top-10 website operates smoothly while innovating to further our mission: to empower everyone to share in the sum of all knowledge. As a member of the SRE team, you will join a diverse and globally distributed group of engineers passionate about exploring, experimenting, and adopting new technologies. We believe in transparency, sharing our documentation, code, and configuration as open source. Our production systems are powered entirely by open-source software, and we encourage you to review our work without any login requirements. If you are intrigued by the challenge of improving the reliability and delivery of one of the Internet’s top websites and thrive in a remote-first environment, we invite you to consider joining us.

Mar 21, 2026
Apply
Comtech LLC logo
Contract|On-site|Seattle

Position: Senior Site Reliability Engineer Location: Seattle, WADuration: 12 monthsInterview: In-person for local candidates or via Phone + SkypeAs a Senior Site Reliability Engineer, you will play a pivotal role in the ongoing maintenance and administration of enterprise-level internet systems. Your primary responsibility will be to diagnose and resolve operational issues, ensuring the seamless functioning of our infrastructure. You will also be tasked with developing tools and scripts to enhance these processes.Collaboration with various teams will be essential to document our enterprise infrastructure and monitoring systems effectively. Additionally, you'll oversee the planning and execution of projects ranging from small to large scale within our Technology teams, reporting directly to your manager. This role demands a high level of technical expertise in both traditional enterprise systems and cutting-edge cloud-native applications.If you share our belief that a simple cup of coffee can transform lives and enhance experiences, we invite you to join us in delivering exceptional services to customers worldwide.

Sep 1, 2017
Apply
Runlayer logo
Full-time|Remote|Remote

AI is revolutionizing the operational landscape for businesses, yet many enterprises find themselves hindered in their efforts to effectively implement AI tools, agents, and workflows. At Runlayer, we are dedicated to dismantling these barriers.Our innovative team has developed AI Actions for OpenAI, delivered Zapier Agents to millions, and launched the first remote MCP server in partnership with Anthropic. With the co-creator of MCP on our cap table, we are establishing the essential platform that enterprises need to leverage AI securely and effectively.Runlayer serves as a unified platform for MCPs, Skills, and Agents. We provide purpose-built security, fine-grained governance, and complete observability, enabling organizations to advance their AI initiatives with confidence. With $11M raised from Khosla Ventures and Felicis, we proudly support clients such as Gusto, Instacart, and Opendoor.As a compact team of 25, primarily engineers, we thrive on rapid deployment and innovation. If you aspire to be at the forefront of AI implementation, now is the time to join us.In the role of Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and scalability of Runlayer's infrastructure as we expand to meet the needs of our enterprise customers across both cloud and on-prem environments.Why You'll Thrive HereImpact: Construct the foundational infrastructure for the enterprise MCP platform, directly facilitating large-scale AI adoption.Excellence: Collaborate closely with founders and a small, experienced engineering team, delivering swiftly in a high-growth setting.Ownership: Take full responsibility for reliability from database performance to incident response and CI/CD pipelines.What You'll DoOversee the reliability and performance of our cloud infrastructure across AWS (ECS, Aurora, CloudWatch) and GCP.Manage and optimize Kubernetes clusters and container orchestration.Lead database reliability engineering efforts, including performance tuning and scaling.Develop and maintain CI/CD pipelines for efficient and secure deployments.Conduct incident response and participate in on-call rotations.Collaborate with product engineers to design scalable and resilient systems.What We're Looking ForProven experience with AWS services including ECS, Aurora, and CloudWatch.Expertise in Kubernetes management and container orchestration.Strong background in database reliability engineering.Solid understanding of CI/CD methodologies and tools.Effective incident response skills and a proactive approach to system reliability.Ability to work collaboratively in a fast-paced environment with a focus on innovation.

Apr 3, 2026
Apply
onebrief logo
Full-time|On-site|Colorado Springs, CO

Join our dynamic team at onebrief as a Senior Site Reliability Engineer in Colorado Springs, where you will play a critical role in enhancing our systems' reliability, performance, and scalability. You'll collaborate with cross-functional teams to implement best practices and ensure the integrity of our services.

Apr 8, 2026
Apply
Veeva Systems Inc. logo
Full-time|Hybrid|Hawaii - Honolulu

Veeva Systems is a mission-driven innovator in the industry cloud, dedicated to accelerating the delivery of therapies to patients. As one of the fastest-growing SaaS companies ever, we achieved over $2 billion in revenue last fiscal year, with significant growth opportunities ahead.Our core values at Veeva emphasize integrity, customer and employee success, and rapid execution. In 2021, we made history as a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere company, we enable you to work from home or in the office, allowing you to flourish in the environment that suits you best.Join us in transforming the life sciences sector, and be part of our commitment to making a positive impact on our customers, employees, and communities.

Jun 27, 2025
Apply
akuity logo
Full-time|Remote|Remote - US Timezones

Join our dynamic team at akuity as a Senior Site Reliability Engineer, where you'll play a pivotal role in enhancing the reliability and performance of our systems. In this exciting remote position, you will collaborate with cross-functional teams to implement innovative solutions that ensure seamless service delivery.Your expertise will be vital in monitoring system health, optimizing performance, and troubleshooting issues to provide exceptional user experiences. If you are passionate about building scalable and robust infrastructures, we want to hear from you!

Mar 18, 2026
Apply
ZipRecruiter logo
Hybrid (Remote options available)|On-site|Los Angeles, CA

Join our innovative team in a hybrid work environment where most US-based roles can be performed remotely.Our Mission:We strive to connect individuals with their next significant opportunity.About Us:At ZipRecruiter, we are a premier online employment marketplace. Utilizing cutting-edge AI-driven smart matching technology, we connect millions of businesses and job seekers through our innovative mobile, web, and email services. Our partnerships with top job boards further enhance our service offerings, and our job search app is the highest-rated on both iOS and Android.Role Overview:We are on the lookout for an experienced Senior Software Engineer specializing in Site Reliability. In this pivotal role, you will collaborate with product-focused engineering teams to provide robust infrastructure, advanced tools, and architectural designs essential for the rapid and sustainable scaling of our services.As part of our SRE team, you will significantly influence engineering methodologies across the organization. Your expertise will help us create streamlined pathways toward best practices, establish safeguards against potential pitfalls, and minimize operational toil. The ideal candidate will possess a comprehensive understanding of both immediate product goals and long-term scalability objectives, along with the technical acumen to design systems that harmonize these priorities.Key Responsibilities:Architect, implement, and troubleshoot large-scale fault-tolerant distributed systems.Develop and maintain a diverse suite of tools and frameworks around our Kubernetes clusters.Design, implement, and optimize a complex and efficient CI/CD infrastructure.

Oct 14, 2025
Apply
Sierra logo
Full-time|On-site|San Francisco, CA

About UsAt Sierra, we are pioneering a transformative platform that empowers businesses to forge authentic customer experiences through AI technology. Headquartered in the vibrant city of San Francisco, we also boast a dynamic presence in Atlanta, New York, London, France, Singapore, and Japan.Our operations are anchored in core values that shape our culture: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These principles guide our actions and are integral to our mission.Our visionary founders, Bret Taylor and Clay Bavor, bring unparalleled expertise. Bret, currently the Board Chair of OpenAI, previously co-led Salesforce and served as CTO at Facebook, while Clay led numerous initiatives at Google, including AR/VR projects and Google Workspace.Your RoleIn your capacity as a Software Engineer on the Site Reliability team, you will play a crucial role in establishing and enhancing the reliability, observability, and scalability of Sierra’s AI-centric infrastructure. Collaborating closely with our engineering and product teams, your goal is to ensure our systems remain highly available, efficient, and primed for growth.Lead the development of Sierra’s observability stack—including monitoring, alerting, logging, and tracing—to provide engineers with critical insights into system health and performance.Collaborate with product and platform engineers to architect systems that prioritize reliability and scalability from the outset, not as an afterthought.Design and implement robust, scalable, and secure cloud infrastructure on AWS, employing Terraform and cutting-edge DevOps tools.Enhance the reliability and scalability of our LLM deployments, ensuring they operate efficiently and cost-effectively.Drive improvements in deployment pipelines, CI/CD tooling, and incident management processes to minimize downtime and accelerate response times.Define and cultivate SRE practices within Sierra, shaping culture, tooling, and best practices across the engineering organization.QualificationsBachelor's degree in Computer Science or a related field, or equivalent experience.Proven experience in Site Reliability Engineering or a similar role, with a strong understanding of cloud infrastructure (AWS).Proficiency in Terraform and modern DevOps practices.Experience with observability tools and techniques—monitoring, alerting, logging, and tracing.Strong problem-solving skills with a focus on scalability and performance optimization.Excellent collaboration and communication skills, with the ability to work effectively in a team environment.

Oct 21, 2025
Apply
Veeva Systems Inc. logo
Full-time|Hybrid|Massachusetts - Boston

At Veeva Systems, we are dedicated to our mission and are recognized as trailblazers in the industry cloud, empowering life sciences companies to expedite the delivery of therapies to patients. As one of the fastest-growing SaaS companies in history, we achieved over $2 billion in revenue in our previous fiscal year, with immense growth opportunities on the horizon.Our core values—Do the Right Thing, Customer Success, Employee Success, and Speed—are the foundation of our culture. Distinctively, we made history in 2021 by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere company, we offer the flexibility to choose between working from home or in the office, allowing you to thrive in your preferred environment.Join us in our mission to transform the life sciences industry and make a positive impact on our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be pivotal in ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your extensive knowledge in Java and modern open-source technologies to significantly enhance our production systems.The ideal candidate will possess substantial experience with Java applications and the latest open-source technologies, particularly from enterprise software development or high-growth technology firms. As a Senior SRE, you should be naturally inquisitive and possess exceptional problem-solving skills. You will bring a unique engineering mindset, comprehending how systems integrate in production to function seamlessly for hundreds of customers across North America, Europe, and Asia.

Oct 7, 2025
Apply
Drata logo
Full-time|$166.9K/yr - $225.9K/yr|Hybrid|Hybrid - San Francisco

Drata helps organizations demonstrate their commitment to security and integrity. The platform supports companies as they build and maintain trust with users, customers, partners, and prospects. Values Built on Trust: Consistency shapes decisions and actions. Integrity: Choosing to do what is right, every time. Customer-Obsessed: Prioritizing customer needs above all else. Competitive Fire: Striving for higher standards and greater achievements. Diversity: Welcoming different perspectives to encourage creative solutions. Automation First: Pursuing efficiency by saving time and resources wherever possible. How the Team Works Drata blends high standards with a supportive environment focused on growth. Team members are encouraged to own their work, improve continuously, and deliver meaningful results. The company values quick, informed decisions that drive immediate impact, while always keeping the mission and customer needs at the center. The San Francisco-based team uses a hybrid work model. Colleagues collaborate in the office Tuesday through Thursday, focusing on alignment and innovation. Mondays and Fridays offer flexibility for deep work or personal needs. Growth and Culture Drata has expanded to over 600 professionals worldwide, recognized for a culture that values trust, speed, and continuous learning. The environment supports both personal and professional development. See the Speed: CEO Adam Markowitz discusses Drata’s rapid journey to $100M ARR in four years. Hear the Voice of the Team: Employee stories highlight collaboration and growth at Drata.

Apr 27, 2026

Sign in to browse more jobs

Create account — see all 51,711 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.