Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
• Bachelor's degree in Computer Science or related field.• Minimum 2 years of experience in site reliability engineering or DevOps.• Proficiency in programming languages such as Python, Go, or Java.• Experience with cloud services (AWS, Azure) and container orchestration (Kubernetes).• Strong problem-solving skills and ability to work in a fast-paced environment.
About the job
Geotab Inc. seeks a Site Reliability Engineer based in Atlanta, Georgia. The position centers on ensuring that systems remain dependable and perform as intended. Working with teams throughout the company, the engineer helps keep infrastructure stable, scalable, and efficient.
Key responsibilities
Monitor and maintain the reliability and performance of core systems
Work with cross-functional teams to support and enhance infrastructure
Automate operational tasks to improve workflow and efficiency
Troubleshoot and resolve complex technical issues as they arise
Location
This role is based in Atlanta, Georgia, USA.
About Geotab Inc.
Geotab Inc. is a global leader in telematics, providing advanced data analytics and fleet management solutions. Our mission is to help businesses improve their operational efficiency and productivity through innovative technology. Join our dynamic team and be part of a forward-thinking company that values collaboration and growth.
Similar jobs
1 - 20 of 874 Jobs
Search for Senior Staff Site Reliability Engineer Platform Engineering
About SaviyntSaviynt is at the forefront of identity security, offering an innovative AI-driven platform designed to govern and protect access to applications, data, and business processes for enterprises and government bodies worldwide. In this AI era, Saviynt empowers organizations to accelerate their operations securely and in compliance with regulations. Importance of This RoleAs a Staff Platform Engineer at Saviynt, you will be pivotal in maintaining the high availability, scalability, and security of our complex, cloud-native systems that underpin our SaaS platform as we continue to expand.This is a hands-on engineering position with a focus on technical leadership. You will take charge of the reliability of major platform domains, devise scalable solutions using Kubernetes and AWS, and spearhead automation and reliability enhancements across various teams. Your ResponsibilitiesIn this crucial role, you will design, build, and maintain the foundational infrastructure services and platforms critical for our product and application teams.Your efforts will be directed towards developing reusable, reliable, and scalable solutions that simplify complexity, enabling teams to concentrate on their core business logic and expedite feature delivery in a multi-cloud environment.Key responsibilities include:Designing and developing core platform components and shared infrastructure services for integration with development teams.Architecting, implementing, and managing highly available Kubernetes platforms as a service for internal users.Creating robust internal tools and automation for infrastructure provisioning and management, primarily utilizing Go (Golang).Optimizing foundational solutions within cloud environments (AWS, Azure, etc.) by establishing reusable patterns and modules.Designing and implementing shared Event-Driven Architecture components and messaging platforms using technologies like Kafka or Google Pub/Sub.Developing and maintaining reliable CI/CD pipelines (e.g., GitLab CI and ArgoCD) to provide standardized and automated deployment workflows.Building resilient Distributed Systems components that enhance reliability, fault tolerance, and performance.Managing and optimizing shared infrastructure across Multi-Region Cloud Environments to ensure global availability and performance.Establishing and refining centralized observability systems to monitor and enhance platform services.
Join Axon and be a Force for Good.At Axon, we are driven by a mission to protect life. Our team tackles society's most pressing issues of safety and justice through a powerful ecosystem of devices and cloud software. We believe in collaboration, embracing diverse perspectives from our customers, communities, and each other.Working at Axon is dynamic, challenging, and impactful. You will be empowered to take ownership and effect real change while growing in a mission-driven environment that values your contributions.Your ImpactAs a Senior Site Reliability Engineer on the APX SRE CloudOps team, you will be responsible for designing and constructing the cloud infrastructure and automation platforms that support Axon's product engineering teams. You will create solutions for multi-cloud environments (Azure, AWS), ensure compliance with FedRAMP standards, and manage large-scale Kubernetes platforms that handle production workloads across various regions. This role involves extensive coding to build services, APIs, and internal tools utilizing languages such as Go and Python. Additionally, you will take part in on-call rotations and incident response, leveraging your operational experience to enhance reliability and guide platform investments. This position merges software engineering expertise with cloud architecture and production accountability.Location - This position is based in our Atlanta (Peachtree Corners), Seattle, or Boston office and operates on a hybrid schedule. We encourage in-person collaboration and require team members to work onsite from Tuesday to Friday, with the flexibility to work remotely on Mondays unless an approved workplace accommodation is in place. We believe that connection fosters innovation, and our in-office culture is designed to promote meaningful teamwork, mentorship, and collective success.
Role overview Geotab Inc. seeks a Site Reliability Engineer based in Atlanta, Georgia. The position centers on ensuring that systems remain dependable and perform as intended. Working with teams throughout the company, the engineer helps keep infrastructure stable, scalable, and efficient. Key responsibilities Monitor and maintain the reliability and performance of core systems Work with cross-functional teams to support and enhance infrastructure Automate operational tasks to improve workflow and efficiency Troubleshoot and resolve complex technical issues as they arise Location This role is based in Atlanta, Georgia, USA.
Full-time|$120K/yr - $175K/yr|Remote|Atlanta, GA preferred, Remote
Join PrizePicks, the fastest-growing sports company in North America, as recognized by Inc. 5000. As the premier platform for Daily Fantasy Sports, we cater to a wide array of sports leagues, including the NFL, NBA, and popular Esports titles like League of Legends and Counter-Strike. Our diverse team of over 550 employees thrives in a culture that embraces inclusivity and values all backgrounds, regardless of sports fandom. Are you ready to revolutionize the DFS industry with us?We are on the lookout for a talented and experienced Senior Site Reliability Engineer to enhance our team. Your expertise in delivering innovative solutions will be crucial as you help us ensure the reliability, scalability, and performance of our infrastructure during our growth and expansion phases.
Full-time|$144K/yr - $191K/yr|On-site|Atlanta, Georgia, United States
Anduril Industries is a cutting-edge defense technology firm dedicated to revolutionizing military capabilities for the U.S. and allied forces through innovative technologies. By integrating the latest expertise and business models from the most forward-thinking companies of the 21st century into the defense sector, we are transforming the design, production, and sale of military systems. Our advanced Lattice OS, an AI-driven operating system, enhances command and control by synthesizing numerous data streams into real-time, 3D operational environments. As we navigate a new era of strategic competition, Anduril pledges to deliver pioneering autonomy, artificial intelligence, computer vision, sensor fusion, and networking technologies to the military in mere months, not years.About The TeamWithin Anduril, the Tactical Recon & Strike (TRS) division has two primary missions: to develop highly capable autonomous drones and to manufacture robust rocket motors at scale. We take innovative products like Ghost, Anvil, Bolt, and Altius from concept to fully operational systems by collaborating closely with specialized engineering, operations, and production teams. Our Anduril Rocket Motor Systems (RMS) team designs and produces solid rocket motors utilizing advanced materials and proprietary processes, ensuring the delivery of safe and reliable propulsion systems that meet diverse mission needs. TRS seeks enthusiastic software and hardware engineers eager to contribute to a diverse portfolio ranging from autonomous aerial systems to high-performance solid rocket motors, all while operating reliably in demanding environments.About The JobAs a Senior Site Reliability Engineer, you will play a pivotal role in deploying, integrating, and managing both customer and developmental cloud environments across TRS. This position calls for a systems-thinking engineer adept at bridging software development, platform engineering, and mission operations to facilitate seamless integration of new capabilities, enhance production scalability, and maintain system reliability. The ideal candidate will oversee the entire lifecycle of cloud deployments, drive continuous improvement in data pipelines and observability infrastructure for TRS's expanding drone fleets, and identify opportunities to utilize emerging platform services to boost system performance and data quality. This role is also crucial for scaling integration best practices and enhancing functional capabilities across additional TRS product lines.
now100 is seeking a talented Site Reliability Engineer (SRE) to join our dynamic team in Atlanta. As an SRE, you will play a crucial role in maintaining and improving the reliability, availability, and performance of our systems. You will collaborate closely with our software engineering teams to build scalable and efficient infrastructures.
Join Our Team at Rainforest!At Rainforest, we are pioneering the payments-as-a-service landscape, offering innovative solutions that simplify payment monetization for specialized software platforms. Our focus is on empowering small to mid-sized platforms to enrich the value they provide to their small business customers through seamless embedded payments, all while alleviating operational and regulatory challenges.Backed by a seasoned fintech founder and a top-tier venture capital firm, we are positioned to make a significant impact in the fintech space. We invite you to join us on this exciting journey!Your RoleWe seek a proactive and hands-on Site Reliability Engineer who excels in building and scaling cloud infrastructure within a dynamic startup environment. This role offers you the opportunity to take ownership of systems from design to production reliability, collaborating closely with engineering teams to deliver secure and scalable payment platforms. If you are passionate about automation, performance, and continuous improvement while making a real impact in fintech, you will thrive at Rainforest.Key ResponsibilitiesManage and scale our AWS-based cloud infrastructure utilizing Terraform and Infrastructure-as-Code (IaC) practices.Develop, operate, and enhance Elastic Kubernetes Service (EKS) and serverless environments that underpin our payment services.Design and maintain modern Continuous Integration/Continuous Deployment (CI/CD) pipelines with GitLab to ensure rapid and secure deployments.Implement and refine monitoring, alerting, and observability practices to guarantee high uptime and swift incident resolution using tools like OpenTelemetry, Prometheus, and New Relic.Automate infrastructure and operational processes to streamline workflows and expedite delivery.Collaborate closely with application engineers to enhance system performance, reliability, and scalability.Lead incident response initiatives, conduct postmortems, and drive a culture of continuous improvement.Contribute to defining and implementing SRE best practices, including Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
Join Anduril Industries as a Site Reliability Engineer focused on Tactical Reconnaissance & Strike. In this dynamic role, you will leverage your expertise in systems engineering, cloud infrastructure, and automation to enhance the reliability and performance of our innovative products. Your contributions will directly impact mission-critical operations and support our commitment to redefining defense technologies.
JOIN OUR TEAMWe have an exciting opportunity for you!As a Staff Engineer specializing in Endpoint Platform, you will play a key role in designing, implementing, and overseeing mobile device management solutions within a large-scale enterprise environment. Your expertise will ensure the secure and efficient handling of mobile devices, applications, and data across diverse platforms such as macOS, iOS, Android, and Windows. You will collaborate with cross-functional teams to optimize mobile operations while upholding security policies, compliance standards, and performance metrics.Your adaptability is essential, as additional responsibilities may arise based on the Company's evolving needs.YOUR ROLE IN THE GAME PLANEvery team member contributes to our success.Engineering ArchitectureEstablish endpoint engineering principles, reference architectures, and reusable configuration patterns.Define secure-by-default device baselines for all platforms.Platform & AutomationImplement GitOps-driven endpoint configuration management.Develop CI/CD validation pipelines for endpoint policies and configurations.Create API-based integrations across identity, asset management, security tooling, and compliance systems.Modern Workplace EnablementArchitect zero-touch, globally scalable device provisioning.Design self-service application delivery models.Enhance the device onboarding experience with measurable SLOs.Measurement & Continuous ImprovementEstablish KPIs for provisioning speed, compliance drift, patch latency, and device trust posture.Leverage telemetry and automation for continuous improvements in endpoint experience and security.
Rithum™ stands as the premier global commerce network, revolutionizing the collaboration between brands, suppliers, and retailers to facilitate seamless e-commerce solutions. Our unparalleled platform equips brands and retailers with the tools to expedite growth, streamline operations across diverse channels, expand product offerings, and improve profit margins. Currently, over 40,000 enterprises place their trust in Rithum to enhance their business across hundreds of channels, collectively generating over $50 billion in annual Gross Merchandise Volume (GMV). Through our innovative commerce, marketing, and delivery solutions, we empower our clients to craft optimized consumer shopping experiences from inception to completion. Overview As a key member of the Database Reliability Engineering (DBRE) team at Rithum, you will uphold the availability, reliability, and observability of our extensive database systems. Our team prioritizes automation to minimize manual tasks and is consistently seeking innovative methodologies for process enhancement. We manage and optimize a vast SQL Server infrastructure, encompassing numerous instances across hybrid environments (on-premises VMware and AWS), alongside various relational and NoSQL database platforms, including MongoDB, DynamoDB, Elasticsearch, MySQL, Postgres, and Redis. These database systems are integral to all facets of our operations. The DBRE team cultivates a robust culture of curiosity, transparency, collaboration, and ongoing learning. In your role as a Senior Database Reliability Engineer, you will be expected to embody these values and promote them among your colleagues. You will manage diverse database systems and spearhead your own projects with a highly technical approach.
Full-time|Remote|Remote (Atlanta, Austin, San Francisco, Seattle)
Role overview ditto is seeking a Senior Platform Engineer, Operator for a fully remote role. This position is open to candidates located in Atlanta, Austin, San Francisco, or Seattle. The focus is on designing, building, and maintaining systems that keep company operations running smoothly and efficiently at scale. What you will do Design and implement systems that improve platform scalability, performance, and reliability. Maintain and enhance the existing infrastructure to support ongoing business operations. Work closely with cross-functional teams to address technical challenges and streamline processes. Use technical expertise and leadership to drive key platform initiatives. Requirements Extensive experience in platform engineering. Strong problem-solving abilities and a collaborative mindset. Proven ability to contribute technical insights and lead engineering projects. Location This is a remote position. Candidates must be based in Atlanta, Austin, San Francisco, or Seattle.
Are you driven by a desire to enhance the reliability and performance of cutting-edge semiconductor technologies? At Falcomm, we are at the forefront of transforming innovative semiconductor research into practical solutions through our high-performance, energy-efficient RF power amplifier technologies. Our mission is to provide dependable, high-efficiency wireless solutions that empower the next generation of communication systems.We are on the lookout for a Senior RFIC Reliability Engineer to spearhead reliability analysis and qualification initiatives for RF integrated circuits and semiconductor products. This pivotal role will concentrate on assessing device reliability, pinpointing failure mechanisms, and crafting testing methodologies to secure the long-term performance and durability of RFIC technologies. You will work closely with RFIC designers, process engineers, packaging teams, and manufacturing partners to advance product development from initial design to production.
Senior Platform Support EngineerJoin our SRE Operations team as a Senior Platform Support Engineer, where you will play a pivotal role in maintaining the seamless operation of Saviynt’s Enterprise Identity Cloud around the clock. This position emphasizes the importance of platform stability, performance, and reliability with a focus on application layer support and operational accountability. Collaborating with fellow operations team members, development, and engineering, you will tackle issues, implement enhancements, and deliver outstanding support. This is an ideal opportunity for individuals who relish operational challenges and enjoy problem-solving in a fast-paced cloud environment, seeing their projects through to fruition.KEY RESPONSIBILITIESExhibit strong pod-level troubleshooting capabilities in AKS/EKS (going beyond mere pod restarts).Analyze performance issues pertaining to applications and databases (RDS, MySQL).Conduct thorough investigations into application performance issues (Java, Grails, Hibernate), identifying root causes and proposing solutions.Supervise monitoring of our SaaS applications and their underlying infrastructure (Kubernetes on AWS and Azure, VPN connections, customer applications, Elastic Search, MySQL) for alerts and performance discrepancies.Possess a solid understanding of fundamental computing concepts such as DNS, IP addressing, Networking, and LDAP.
About SaviyntSaviynt stands at the forefront of identity security, offering an innovative AI-driven platform designed to manage and safeguard access to applications, data, and essential business processes for some of the largest enterprises and government entities worldwide. As organizations navigate the complexities of the AI era, Saviynt empowers them to accelerate operations while ensuring security and compliance.Why This Role MattersThe reliability and performance of Saviynt’s platform are vital for our customers. As we expand our global footprint, these attributes are not merely desired; they are fundamental to our product's success.In your role as a Principal Engineer, you will be pivotal in shaping and implementing the reliability strategy for our SaaS platform. This high-impact, hands-on engineering position offers extensive influence across infrastructure, platform, and application teams, allowing you to significantly impact how Saviynt designs, operates, and measures reliability at scale.This opportunity is perfect for engineers eager to tackle challenging reliability issues, influence architecture across teams, and leave a lasting legacy on a rapidly growing SaaS platform serving the Federal sector.Your ResponsibilitiesIn this critical position, you will be instrumental in designing, building, and maintaining the shared infrastructure services and platforms relied upon by our product and application teams.Ensure effective vulnerability management while holding teams accountable to meet customer-facing Service Level Agreements (SLAs).Design Continuous Delivery (CD) processes specifically for government deployments with future commercial applications.Create robust internal tools and automation for infrastructure provisioning and management, primarily utilizing Go (Golang) or Python.Architect and optimize foundational solutions within Cloud environments (AWS, Azure, etc.), focusing on reusable patterns and modules for other teams.Design and implement shared Event-Driven Architecture components and messaging platforms using technologies such as Kafka or Google Pub/Sub for ease of use by product teams.Build resilient Distributed Systems components that act as foundational elements for other applications, emphasizing reliability, fault tolerance, and performance.Manage and optimize our shared infrastructure across Multi-Region Cloud Environments, ensuring global availability and performance of platform services for all users.Establish and enhance centralized Observability and Monitoring platforms and tools that deliver self-service insights for consuming teams.Define and implement clear, well-documented RESTful API designs for the infrastructure services you develop, ensuring seamless integration.
Join Nagarro as a Senior Staff Engineer specializing in Delivery, where you'll play a pivotal role in shaping the future of our engineering projects. Your expertise will drive innovation and ensure the successful implementation of cutting-edge solutions.
Join our dynamic team at Nagarro as a Senior Staff Engineer, Delivery. In this pivotal role, you will drive innovative solutions and lead engineering projects that impact our clients and their industries. You will collaborate with cross-functional teams to ensure timely and efficient delivery of high-quality software solutions.
Join Nagarro as a Senior Staff Engineer specializing in Big Data technologies. In this role, you will leverage your extensive technical expertise to design and implement robust data solutions that drive business decisions. Collaborate with cross-functional teams to innovate and optimize our data architecture and analytics capabilities.
Smarsh seeks a Senior Platform Engineer I based in Atlanta. This position centers on building and refining platform features that enable reliable software delivery across the organization. Role overview This engineer will play a key part in shaping the systems that support Smarsh’s software teams. The focus is on both development and improvement of core platform capabilities. Collaboration Work involves close coordination with teams throughout the company. Expect to contribute to system design and architecture discussions, sharing insights and helping guide technical decisions.
Join our innovative team as a Senior Staff Engineer specializing in Java Development at Nagarro in Atlanta. We are looking for a passionate engineer who thrives in a collaborative environment and seeks to make a significant impact through technology. You will be responsible for designing, developing, and maintaining high-performance applications that meet our clients' needs.
Who are we?At Smarsh, we empower organizations to manage risks and harness insights within their digital communications. Our robust community of over 6,500 clients in regulated sectors relies on us daily to identify compliance, legal, and reputational risks across more than 80 communication channels, preventing potential regulatory penalties and negative press. Our commitment to relentless innovation has earned us consistent recognition from esteemed analysts like Gartner and Forrester, and our rapid growth has placed Smarsh on the Inc. 5000 list of fastest-growing American companies since 2008.Position OverviewJoin the SMB Platform Engineering team at Smarsh as we drive the reliability, automation, and advancement of the infrastructure supporting our SMB Product—a vital archiving and communication surveillance platform currently operating on-premises. In the coming year, our team will expand to include cloud infrastructure responsibilities as part of Smarsh's extensive platform modernization initiative.As a Senior Platform Engineer, you will have ownership over the design, implementation, and management of infrastructure automation and platform tooling within our on-premises environment, with cloud responsibilities anticipated to be integrated soon. You will independently tackle complex projects and act as a technical liaison to other Smarsh platform teams. Success in your first year will include a measurable decrease in operational toil, the completion of at least one major automation or migration project, and the development of comprehensive runbooks for the systems you manage.
Apr 10, 2026
Sign in to browse more jobs
Create account — see all 874 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.