About the job
Join our dynamic team at Nix as a Senior Java Engineer and Site Reliability Engineer (SRE). In this pivotal role, you will be instrumental in maintaining the health, reliability, and responsiveness of over 200 mission-critical microservices, while helping to shape the future of AI-driven commerce.
Key Responsibilities
- Continuously monitor, analyze, and improve the performance of 200+ distributed microservices.
- Lead incident response initiatives and promote operational excellence as part of our 24/7 SRE on-call rotation, ensuring maximum uptime and adherence to stringent SLAs.
- Drive critical DevOps outcomes, including CVEs, software upgrades, automated failover, resilience engineering, robust security design, and infrastructure enhancements.
- Collaborate with cross-functional teams to design, implement, and maintain advanced monitoring, alerting, and automation frameworks.
- Develop standardized tools and practices that facilitate rapid recovery, continuous improvement, and compliance.
- Engage in backend development using Java, focusing on debugging, optimizing, and maintaining high-availability server-side applications and distributed systems.

