About the job
Join Us in Powering Global Connections!
At Kong, we believe that technology should connect rather than divide. If you’re passionate about building robust systems that facilitate seamless API connectivity, we want to hear from you!
About the Position:
As a Senior Site Reliability Engineer, you will be an integral part of our global Platform SRE team, dedicated to developing, maintaining, and scaling Kong's multi-region SaaS platform that underpins the world's API connectivity.
You will design and automate production systems that cater to thousands of customers across AWS, GCP, and Azure. Your work will encompass everything from multi-region Kubernetes clusters to service mesh and gateway architectures, ensuring the utmost reliability, scalability, and security of our SaaS offerings.
This hands-on role is ideal for engineers who thrive in environments where they can optimize production SaaS systems at scale, automate operations, and enhance performance, resilience, and deployment pipelines.
Your Responsibilities Will Include:
Overseeing and scaling Kong's global SaaS platform (Konnect) to ensure reliability, availability, and performance across various regions and cloud environments.
Building, automating, and maintaining a Kubernetes-based infrastructure along with deployment workflows utilizing Terraform/Terragrunt, Helm, and ArgoCD.
Designing, maintaining, and optimizing multi-region data and caching layers, including PostgreSQL, Redis, ClickHouse, and Druid, for high availability and low latency.
Operating and enhancing Kong Gateway and Kong Mesh environments that support hybrid and distributed architectures.
Developing and maintaining CI/CD pipelines and GitOps workflows to automate service delivery and ensure consistent infrastructure modifications.
Enhancing observability and incident response readiness through tools such as Datadog, Prometheus, Grafana, and Thanos, while defining and tracking SLOs.
Collaborating effectively with development and security teams to ensure smooth operation of SaaS services adhering to reliability, security, and regulatory standards.
