Senior Site Reliability Engineer
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
About Hive
Hive is at the forefront of revolutionizing how organizations understand, search, and generate content through advanced AI solutions. Trusted by some of the most innovative companies globally, we empower developers with an exceptional suite of pre-trained AI models that handle billions of customer API requests monthly. Our robust portfolio includes proprietary software applications that drive transformative use cases across various industries, from content moderation to context-based ad targeting. Backed by over $120M in investments from top-tier firms, Hive's global team of 250+ employees is dedicated to shaping the future of AI from our offices in San Francisco, Seattle, and Delhi.
Similar jobs
Search for Senior Database Reliability Engineer
934 results
Rithum™ stands as the most trusted commerce network globally, revolutionizing collaboration among brands, suppliers, and retailers to deliver seamless e-commerce experiences. Our unparalleled platform empowers brands and retailers to drive growth, streamline operations across multiple channels, expand product offerings, and enhance profit margins. Currently, over 40,000 businesses rely on Rithum to scale their operations across numerous channels, representing an impressive annual GMV exceeding $50 billion. By leveraging our commerce, marketing, and delivery solutions, our clients are able to craft optimized consumer shopping journeys from start to finish. Overview The Database Reliability Engineering (DBRE) team at Rithum is dedicated to ensuring the availability, reliability, and observability of our database systems. We emphasize automation to minimize manual tasks and are continually exploring enhancements to our processes. Our current responsibilities include managing and optimizing a large-scale SQL Server environment that encompasses hundreds of instances across hybrid infrastructures (on-prem VMware and AWS), in addition to various relational and NoSQL database platforms including MongoDB, DynamoDB, Elasticsearch, MySQL, Postgres, and Redis. These systems are integral to all business functions. We foster a team culture grounded in curiosity, integrity, collaboration, and a commitment to continuous learning. In your role as a Senior Database Reliability Engineer, you will embody these values and cultivate them within your team. You will manage diverse database systems and take the lead in designing and executing projects with a strong technical focus.
Role Overview Airwallex is hiring a Senior Site Reliability Engineer - Database in Seattle. This position focuses on improving the reliability and performance of database systems that support Airwallex’s services. The role centers on maintaining high availability and efficiency across cloud infrastructure and database environments. What You Will Do Enhance the reliability of database systems used by Airwallex Monitor and optimize database performance to support service uptime Apply experience with cloud infrastructure and database management to daily operations Location Seattle, United States
Comtech LLC
Position: Senior Site Reliability Engineer Location: Seattle, WADuration: 12 monthsInterview: In-person for local candidates or via Phone + SkypeAs a Senior Site Reliability Engineer, you will play a pivotal role in the ongoing maintenance and administration of enterprise-level internet systems. Your primary responsibility will be to diagnose and resolve operational issues, ensuring the seamless functioning of our infrastructure. You will also be tasked with developing tools and scripts to enhance these processes.Collaboration with various teams will be essential to document our enterprise infrastructure and monitoring systems effectively. Additionally, you'll oversee the planning and execution of projects ranging from small to large scale within our Technology teams, reporting directly to your manager. This role demands a high level of technical expertise in both traditional enterprise systems and cutting-edge cloud-native applications.If you share our belief that a simple cup of coffee can transform lives and enhance experiences, we invite you to join us in delivering exceptional services to customers worldwide.
Axon Enterprise, Inc.
Join Axon and Make a Difference.At Axon, our mission is to protect life. We tackle society's most pressing safety and justice challenges with our innovative ecosystem of devices and cloud software. Collaboration is at the heart of our success; we engage with transparency and empathy, welcoming diverse perspectives from our customers and each other.Life at Axon is dynamic, challenging, and impactful. Here, you’ll take charge and instigate genuine change while evolving in a mission-driven environment that values your contributions.Your ImpactAs a Senior Site Reliability Engineer (SRE) on the APX SRE CloudOps team, you will craft and maintain the cloud infrastructure and automation platforms that are vital for Axon's product engineering teams. You will design solutions for multi-cloud architectures (Azure, AWS), ensure compliance with FedRAMP regulations, and oversee large-scale Kubernetes platforms that support production workloads across various regions. A significant part of your role will involve writing code: developing services, APIs, and internal tools using languages such as Go and Python. Additionally, you will be part of on-call rotations and incident response teams, leveraging your operational expertise to enhance reliability and guide platform investments. This position merges deep software engineering expertise with large-scale cloud architecture and production ownership.Location: This position is based in our Seattle, Atlanta, or Boston offices and follows a hybrid work model. We emphasize in-person collaboration, requiring team members to work on-site from Tuesday to Friday, with the flexibility to work remotely on Mondays unless a workplace accommodation is arranged. We believe that connection fuels innovation, and our office culture is designed to encourage meaningful teamwork, mentorship, and collective success.
Join Hive, a leading innovator in cloud-based AI solutions, as a Senior Site Reliability Engineer. We are seeking a talented individual to help maintain and enhance the reliability of our enterprise SaaS offerings. In a dynamic environment that combines our own data centers with public cloud services, you will play a critical role in automating processes and optimizing performance at scale. If you thrive in unstructured settings and are passionate about making every task efficient through automation, consider becoming a part of our forward-thinking team.
Anduril Industries
Anduril Industries is a pioneering defense technology firm dedicated to revolutionizing military capabilities for the U.S. and its allies through cutting-edge technology. By integrating the innovative approaches and business models of today’s most advanced companies into defense, we are reshaping the design, development, and deployment of military systems. Our flagship system, powered by Lattice OS, harnesses AI to transform vast data streams into a real-time, three-dimensional command and control hub. As we navigate an era of strategic competition, Anduril is committed to delivering state-of-the-art autonomy, AI, computer vision, sensor fusion, and networking solutions to military operations in a matter of months rather than years.ABOUT THE TEAMThe Business Systems team plays a crucial role in the development and enhancement of various systems that empower Anduril to fulfill its mission. Our technology supports essential functions across supply chain, accounting, sales, engineering, modeling, simulation, field maintenance, and manufacturing. We collaborate across the organization to provide the necessary tools and capabilities for mission success.ABOUT THE JOB:We are on the lookout for an experienced Senior Site Reliability Engineer to join our dynamic team. In this role, you will be tasked with building, deploying, scaling, and maintaining the vital infrastructure supporting our systems. You will engage with diverse stakeholder teams to ensure rapid and secure progress on their technological initiatives.WHAT YOU'LL DO:Provision, manage, and scale complex infrastructure for the entire Business Systems division.Continuously enhance and optimize CI/CD pipelines for improved efficiency, reliability, and software delivery speed.Foster a culture of observability and reliability within the organization by promoting best practices and tools that enhance system visibility and resilience.Collaborate with engineering teams to comprehend their requirements and develop functional cloud solutions that adhere to industry best practices.Possess a thorough understanding of the company’s business objectives and design infrastructure solutions that align with these goals.Strengthen systems and evaluate workload demands while planning resource capacity for optimal performance and cost-effectiveness.
Comtech LLC
Position: Senior Site Reliability EngineerContract Duration: Long TermKey SRE Requirements:Hands-on experience with configuration management tools such as Chef, Puppet, Azure, and Ansible; proficiency in at least two of these platforms is essential. Familiarity with programming/scripting languages including Python, PowerShell, Ruby, or Perl is a must, with a requirement of at least two languages.Knowledge of Agile methodologies and project management practices.As a Senior Site Reliability Engineer, your primary focus will be on maintaining and managing enterprise-level Internet systems. This role involves troubleshooting and resolving operational challenges, developing scripts and tools for maintenance, and collaborating with cross-functional teams to document enterprise infrastructures and monitoring frameworks. You will also plan and execute various technology projects under managerial guidance. To thrive in this role, you must possess deep technical knowledge of both large-scale enterprise systems and innovative cloud-native applications. If you share our belief that even a simple cup of coffee can transform lives and the world, join us in providing exceptional experiences for our customers globally.Essential Qualifications:Proven experience collaborating with cross-functional teams on system integration and design, including crafting operational specifications and test plans.Expertise in web server management (IIS, Apache) and application servers (.Net, Java, Tomcat, JBoss), including installation, configuration, administration, and performance optimization.Experience with configuration management platforms such as Chef, Ansible, CFEngine, and Puppet.Understanding of internet standards such as HTTP, DNS, FTP, SSH, HTML, XML, JDBC, ODBC, SNMP, and other protocols.Knowledge of data storage systems (SAN, NAS, RAID Arrays).Experience in network hardware architecture and troubleshooting, including load balancers, switches, and routers.
Join Aetherflux as a Senior Electrical Engineer specializing in radiation effects and avionics reliability. In this pivotal role, you will leverage your expertise to advance our cutting-edge technologies and ensure the reliability of our avionics systems in challenging environments. You will collaborate with a talented team of engineers and contribute to innovative projects that shape the future of aerospace technology.
PitchBook Data
At PitchBook, part of the Morningstar family, we embrace innovation and continuous growth. We prioritize collaboration, excitement, and a vibrant culture that empowers everyone.Our robust learning initiatives and mentorship programs foster a culture of curiosity, driving us to discover new solutions and enhance our processes. As we navigate the dynamic nature of our industry, we thrive on challenges, are willing to take calculated risks, and embrace the learning that comes from failure in our quest for excellence.If you possess a positive attitude and a determination to make things happen, PitchBook is your ideal workplace.About the Role:As a Senior Site Reliability Engineer (SRE) at PitchBook, you will play a crucial role in our Product and Engineering team, comprising innovative thinkers and problem solvers dedicated to enhancing our customer experience and business outcomes. We prioritize curiosity and aim to improve our practices continually, focusing on creating exceptional customer experiences through innovative product development.Recognizing that success stems from collaboration and diverse perspectives, we work closely with global partners. We foster an environment of positivity, encourage constructive discussions, and uphold a culture grounded in respect, integrity, and growth. Our investment in our people reflects our commitment to continuous improvement. Join us and advance your career!
Anduril Industries
At Anduril Industries, we are at the forefront of defense technology, dedicated to revolutionizing military capabilities for the U.S. and its allies through advanced technologies. We leverage the innovative expertise and business models of the 21st century to redefine how military systems are designed, constructed, and sold. Our flagship offering, powered by Lattice OS, utilizes AI to integrate thousands of data streams into an intuitive, real-time 3D command and control center. As we navigate a new era of strategic competition, we are committed to deploying cutting-edge autonomy, AI, computer vision, sensor fusion, and networking technologies at a pace that meets the urgent demands of national security.ABOUT THE TEAMThe Production Engineering team is a newly established unit within Anduril's Software Platform, focused on ensuring the reliability, performance, and scalability of mission-critical systems that support our warfighters. We tackle complex reliability challenges at scale, ensuring that the essential components of Lattice operate seamlessly in demanding environments. In this foundational role, you will be among the first to build this team from the ground up, with the unique opportunity to influence technical direction, establish best practices, and define production engineering excellence at Anduril. Our team exists at the intersection of software engineering and systems reliability, developing the infrastructure, tooling, and processes necessary to keep our systems fully operational 24/7/365.ABOUT THE ROLEWe are on the lookout for a Senior Site Reliability Engineer who is passionate about constructing resilient, highly available systems that can scale to meet the requirements of the core systems supporting Lattice. You will collaborate closely with platform engineering teams, product developers, and field operations to proactively identify reliability risks, implement defensive strategies, and enhance the operational excellence of our software platform. If you enjoy solving complex problems at scale and wish to have a direct impact on national security, this is the opportunity for you.
Palona
At Palona, we are pioneering the integration of cutting-edge generative and multimodal AI into the hospitality sector. Our dynamic engineering team drives innovation at a rapid pace, utilizing generative AI models to create products that continually adapt and improve. In this fast-evolving landscape, traditional software excellence needs to evolve to accommodate the unique nature of AI outputs, which differ significantly from conventional software failures. This position is crucial in establishing an engineering discipline that identifies potential issues before they impact our customers.In this role, you will engage with evaluation pipelines, observability, cloud infrastructure, and CI/CD processes to enhance Palona's AI agent platform. You will blend DevOps and AI reliability, overseeing production infrastructure while developing tools that ensure optimal AI agent performance.ResponsibilitiesAs an AI Reliability Engineer, your key responsibilities will include:Creating and implementing observability systems to identify quality degradation, latency issues, and system anomalies in production, including the development of instrumentation, dashboards, and alerting mechanisms.Writing and maintaining automated tests to assess agent output quality, incorporating deterministic checks and LLM-as-judge evaluations.Developing automated release and validation systems to streamline deployments across different environments and enforce quality gates for AI-driven products.Building and refining platform infrastructure using infrastructure as code, with a strong emphasis on reliability, scalability, and cost efficiency.Enhancing evaluation pipelines that gauge AI agent conversation quality, accuracy, and safety, collaborating with product and engineering teams to refine evaluation criteria.Designing and developing internal tools and services that bolster AI reliability, evaluation, and operational workflows.Architecting new systems to tackle emerging reliability and quality challenges within the AI agent platform.Producing production-grade code for reliability and evaluation infrastructure, contributing as a software engineer rather than merely an operator.
StemXpert1
Join our innovative team at StemXpert1 as a Database DevOps Engineer, where you will play a crucial role in managing and optimizing our database systems. In this position, you will collaborate with cross-functional teams to ensure the seamless integration of database solutions into our development processes. Your expertise will contribute to enhancing the performance, reliability, and security of our database infrastructure.
Axon Enterprise, Inc.
Axon is seeking a Senior Site Reliability Engineer I in Seattle to help maintain and improve the reliability and performance of cloud-based services. This position plays a key part in supporting critical systems used by law enforcement and public safety organizations. Role overview This engineer works closely with teams across the company to design and implement scalable infrastructure. The role involves monitoring systems, responding to incidents, and contributing to the continuous improvement of development processes. What you will do Ensure the reliability, availability, and performance of Axon's cloud services Collaborate with other teams to design scalable infrastructure solutions Apply monitoring tools and respond to incidents as they arise Drive enhancements in system performance and development workflows Requirements Experience in site reliability engineering or a related field Strong technical skills in cloud infrastructure and monitoring Ability to work effectively with cross-functional teams Proactive approach to problem-solving and process improvement
About BRINC:At BRINC, we are revolutionizing public safety with our groundbreaking suite of life-saving technologies. Our venture began with the design of drones and ruggedized throw phones, enabling access to hazardous environments and facilitating communication to defuse critical situations. Today, we have broadened our scope to include the establishment and deployment of 911 response networks, where drones are dispatched in response to emergency calls, providing real-time visual data that enhances safety and ensures de-escalation of incidents. Our innovative solutions are currently utilized by over 600 public safety agencies across the United States, and we have successfully secured over $150 million in funding from esteemed investors such as Index Ventures, Motorola Solutions, Sam Altman, Dylan Field, and Mike Volpe. At BRINC, our mission is to recruit top-tier talent to support first responders in their life-saving efforts.About this Role:As the Senior Site Reliability / DevOps Engineer, you will take charge of ensuring the reliability, scalability, and operational excellence of our production systems. Your responsibilities will include constructing secure cloud infrastructure, automating processes, and developing deployment pipelines that power our global real-time services. You will collaborate closely with software, hardware, and autonomy teams to enhance system performance, incident management, and developer productivity.Key Responsibilities:You will be accountable for platform reliability, uptime, and on-call protocols for our worldwide real-time services.Design and manage secure, scalable cloud infrastructure and deployment systems.Develop automation and internal tools to enhance developer efficiency and minimize manual tasks.Implement observability solutions (metrics/logs/tracing), alerting mechanisms, and conduct incident response and postmortems.Lead capacity planning, disaster recovery strategies, and optimize cost/performance ratios.Collaborate across teams to support live drone data streaming and enhance customer workflows.Qualifications:BS/MS/PhD in Computer Science or a related field, with over 5 years of experience operating production systems and building infrastructure.Solid software engineering foundation; proficient in writing production-quality code (Python/Node.js/JavaScript, etc.).Practical experience with Infrastructure as Code and modern cloud architectures.Proven track record of building and maintaining CI/CD pipelines and safe release processes.Exceptional attention to detail concerning security, scalability, and performance aspects.
DigitalOcean
Join DigitalOcean as a Customer Success Engineer on our 2nd Shift team! In this role, you will be pivotal in ensuring our customers achieve their goals with our database solutions. You will serve as a technical consultant, providing insights and troubleshooting assistance to enhance their experience with our platform.
Join aetherflux as a Software Engineer specializing in Reliability for Avionics and Compute Systems. In this key role, you will be responsible for ensuring the dependable operation of our cutting-edge technology solutions that power the aviation industry. Collaborate with a talented team to design, develop, and test reliable software components that meet rigorous standards.
Sonsoft Inc.
We are seeking a talented and experienced Exadata Database Administrator to join our dynamic team at Sonsoft Inc. As an Exadata DBA, you will be responsible for managing and optimizing our Exadata environments, ensuring high availability, performance, and security of our databases.Your expertise in Oracle Database technologies will be crucial in supporting various projects and initiatives. If you are passionate about database management and enjoy solving complex challenges, we would love to hear from you!
Join our dynamic team as a Site Reliability Engineer!As an integral member of our operations, you will oversee monitoring, provisioning, and ensuring the resilience of our systems while engaging with customers to address their needs effectively.Your role will involve extensive work on Windows systems, focusing on regular patching, certificate provisioning, and renewals. You will also be expected to map out existing infrastructure flows and dependencies, leading root cause analysis meetings, and responding promptly to incidents.
Sonsoft Inc.
Join our dynamic team at Sonsoft Inc. as an Exadata Database Administrator. In this critical role, you will be responsible for managing and optimizing our Exadata environments, ensuring high availability and performance. You will collaborate with cross-functional teams to implement best practices and troubleshoot database issues.
Sonsoft Inc.
Join Sonsoft Inc. as an Exadata Database Administrator, where you will play a critical role in managing and optimizing our Exadata databases. You'll be responsible for ensuring database performance, availability, and security, while also collaborating with our development and operations teams to implement best practices.
Sign in to browse more jobs
Create account — see all 934 results

