Qualifications
Key Responsibilities:
Act as the primary authority on platform reliability, health, and performance for all Coupang customer-facing services.
Acquire in-depth knowledge of Coupang's application workflows and their dependencies.
Establish and monitor key performance indicators (KPIs) and service-level objectives (SLOs) related to system reliability, availability, and performance.
Create a world-class incident management process and automation, facilitating rapid incident remediation and conducting operational reviews for incidents.
Develop and implement best practices for effective monitoring, alerting, and telemetry systems.
Automate regular Disaster Recovery testing, Chaos testing, and Load testing to proactively prepare for the anticipated growth of Coupang services.
Collaborate with product development teams to ensure products are designed with scalability and operability in mind.
Establish appropriate guardrails and automation for deploying production changes while upholding reliability standards.
Participate in a 24x7 rotation for production issue escalations, thriving in a fast-paced environment.
Communicate effectively across teams to enhance collaboration and problem-solving.
About the job
At Coupang, our Site Reliability Engineers (SREs) play a vital role in seamlessly integrating software and system engineering to construct, operate, and enhance our expansive e-commerce systems. As a member of the Site Reliability Engineering team, you will ensure that all customer-facing services remain robust, monitored, automated, and scalable. Our SRE organization champions an operational mindset that prioritizes automation. In this position, you will leverage your expertise to develop top-tier infrastructure automation across various domains, including Observability, Incident Management, Disaster Recovery, Load Testing, and Capacity Engineering. You will collaborate closely with our product development teams, from the initial design phase through to resolving production incidents, while maintaining service level indicators (SLIs) and service level agreements (SLAs), and advocating for SRE principles and best practices. If you take pride in your comprehensive ownership, have a passion for tackling complex technical challenges within large-scale distributed systems, and possess the ability to communicate effectively across team boundaries, this role is tailored for you!
About Coupang, Inc.
Coupang is a leading e-commerce platform known for its commitment to innovation, customer satisfaction, and rapid delivery. Our mission is to provide an unparalleled shopping experience to our customers, leveraging cutting-edge technology and a customer-centric approach. Join us to be part of a dynamic and forward-thinking organization that values creativity, collaboration, and excellence.