Qualifications
In This Role, You Will:Design, develop, and manage OpenAI’s multi-tenant caching platform utilized across inference, identity, quota, and product experiences. Establish the long-term vision and strategic roadmap for caching as a core infrastructure capability, effectively balancing performance, durability, and cost. Collaborate with other infrastructure teams (e.g., networking, observability, databases) and product teams to ensure our caching platform aligns with their requirements. You Might Thrive In This Role If You:Possess 5+ years of experience in building and scaling distributed systems, with a strong emphasis on caching, load balancing, or storage systems. Hold in-depth knowledge of Redis, Memcached, or similar technologies, including clustering, durability configurations, client-side connection patterns, and performance optimization. Have practical experience with Kubernetes, service meshes (e.g., Envoy), and autoscaling solutions. Approach design with a keen focus on latency, reliability, throughput, and cost-efficiency. Excel in a dynamic environment and appreciate the blend of practical engineering with a commitment to long-term technical excellence.
About the job
About the Team
At OpenAI, we are on a mission to develop safe and beneficial artificial general intelligence. Our models are integrated into innovative products such as ChatGPT and various APIs. To ensure these systems are swift, reliable, and economically viable, we require top-tier infrastructure that stands out in the industry.
The Caching Infrastructure team plays a pivotal role by creating a robust caching layer that supports numerous critical applications at OpenAI. Our goal is to deliver a high-availability, multi-tenant caching platform capable of auto-scaling with workload demands, reducing tail latency, and accommodating a wide array of use cases.
We seek an experienced engineer who can design and scale this essential infrastructure. The ideal candidate will possess extensive experience in distributed caching systems (e.g., Redis, Memcached), a solid understanding of networking fundamentals, and expertise in Kubernetes-based service orchestration.
About OpenAI
About OpenAIOpenAI is a pioneering company in AI research and deployment, dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We continuously push the boundaries of AI capabilities, striving to deploy them safely and effectively through our innovative products. As a powerful tool, AI must be developed with a focus on safety and the needs of people, making our mission both challenging and rewarding.