companyRubrik logo

Production Engineer/Site Reliability Engineer (Shift Basis)

RubrikBangalore
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Experience You Will Need:In-depth understanding of distributed system concepts. Hands-on experience with production systems and environments, ideally within public cloud infrastructures. Familiarity with container orchestration platforms, particularly Kubernetes. Proficient with infrastructure management tools such as CloudFormation and Terraform. Strong analytical and problem-solving skills for diagnosing and resolving system and application issues. Proficient in data structures and algorithms, UNIX, networking, operating systems, and database systems like MySQL. Solid Python programming skills. Excellent verbal and written communication abilities.

About the job

About the Role:

Production Engineer
The Production Engineer at Rubrik is pivotal in ensuring operational excellence, managing alerts, addressing outages, and spearheading incident resolution as an Incident Manager. This position demands hands-on expertise in maintaining highly available critical services across multi-cloud environments while fostering continuous improvements through automation and intelligent monitoring.

What You Will Do:

  • Become a key member of a 24/7 Production Operations team dedicated to managing and supporting vital infrastructure and services across multi-cloud environments.
  • Supervise staging and production environments to guarantee maximum uptime and reliability.
  • Deploy and maintain comprehensive observability solutions for real-time monitoring, alerting, and metrics collection.
  • Lead incident management initiatives by promptly responding to alerts and outages, coordinating teams for swift resolution.
  • Investigate recurring incidents to identify root causes, mitigate toil, and enhance system resilience.
  • Design and develop automation tools to proactively detect, triage, and rectify production issues.
  • Update and maintain runbooks to facilitate incident response and address recurring issues.
  • Exhibit strong decision-making abilities under pressure, managing critical situations with urgency and composure.

About Rubrik

Rubrik (NYSE: RBRK) is committed to securing the world’s data. With our innovative Zero Trust Data Security™, we empower organizations to achieve resilience against cyberattacks and internal threats.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.