Join Our TeamArista Networks is on the lookout for a talented Site Reliability Engineer (SRE) to enrich our Engineering Productivity (EngProd) team. You will play a pivotal role in maintaining and enhancing our growing infrastructure tailored for our internal user base. The ideal candidate will be adaptable, proactive, and eager to embrace new technologies. As part of our software engineering team, you will collaborate with fellow engineers to design, construct, and manage secure, scalable, and fault-tolerant tools within a hybrid cloud environment.In the EngProd group, you will work closely with engineers to architect, build, scale, and manage systems utilized by Arista’s product development teams. These systems incorporate industry-standard technologies such as Ansible, Artifactory, Gerrit, Jenkins, Kubernetes, Grafana, Spinnaker, MySQL, ElasticSearch, Google Cloud, Varnish, and Perforce, along with bespoke internal systems designed to automate CI/CD, testing, analysis, and visualization.Your ResponsibilitiesSafely build, deploy, and operate critical production systems with an emphasis on scalability, reliability, observability, performance, and security.Monitor and enhance the developer experience across various services.Automate processes to minimize toil and streamline production operations.Proactively monitor, respond to, and improve alerts; establish automated alert handling.Draft and maintain incident response documentation.Triage platform and infrastructure issues, assisting Arista software engineers in their troubleshooting efforts while engaging with third-party vendor support.Compose postmortem reports and devise solutions to prevent recurrence of incidents.Plan and communicate maintenance schedules for production systems.Collaborate with product development teams to identify and resolve infrastructural bottlenecks affecting their workflows.Research and implement best practices for maintaining secure, scalable, and fault-tolerant systems.Analyze the design and implementation details of open-source systems to improve triage and resolution processes.
Feb 24, 2026