About the job
About Zuora
Zuora builds platforms that help businesses adapt and grow. Our solutions support subscription models, usage-based pricing, and automation powered by AI. Companies use Zuora to launch new offerings, automate billing, and manage recurring revenue with confidence. After more than a decade shaping the Subscription Economy, Zuora continues to expand its platform for quote-to-cash operations, giving organizations a flexible, AI-ready foundation for monetizing products and services.
Role Overview: Lead Senior Site Reliability Engineer
Zuora is hiring a Senior Site Reliability Engineer to lead reliability initiatives and drive AI-powered automation at scale. This role involves managing complex systems, influencing architecture, and partnering with teams across the company. The position is based in Chennai, Tamil Nadu, India.
What You Will Do
- Define and improve Service Level Objectives (SLOs), Service Level Indicators (SLIs), and resilience frameworks.
- Develop AI-driven automation for detection, remediation, and forecasting.
- Oversee cloud infrastructure and Kubernetes platforms.
- Lead incident response and foster operational excellence.
- Mentor engineers and help shape reliability practices across the organization.
What We’re Looking For
- At least 8 years of hands-on experience in Site Reliability Engineering, DevOps, or managing large-scale production systems.
- Deep expertise with AWS, including architecture and services such as EC2, EKS, VPC, IAM, RDS, S3, and CloudWatch.
- Advanced skills in Infrastructure-as-Code using Terraform, including complex module development and state management.
- Strong programming background in Python and Shell, with a track record of building production automation.
- Thorough understanding of Linux systems, including performance tuning, security, and troubleshooting.
- Experience managing distributed systems and high-throughput data streaming platforms like Kafka.
- Ability to independently solve complex, ambiguous problems with broad organizational impact.
- Proven leadership in cross-team reliability or infrastructure projects, guiding technical direction, influencing design, and mentoring engineers for scalable results.
AI & Automation
- Direct experience leading AI and automation projects.
