About the job
Become part of the innovative engineering teams at OpenAI, where we create and deliver groundbreaking AI technologies responsibly and safely to the world!
Our Applied Engineering team collaborates across research, engineering, product, and design disciplines to deploy OpenAI's cutting-edge technology for both consumers and businesses. We are committed to learning from our deployments and ensuring that AI is utilized ethically while maximizing its benefits. To us, safety takes precedence over unchecked growth.
About the Role
We are in the process of developing OpenAI's observability product, which encompasses everything from scalable infrastructure to an intuitive, AI-enhanced user interface. Our systems process petabytes of logs and billions of time series metrics throughout our infrastructure. We are now integrating intelligence to create features like agents that summarize service events, auto-generate dashboards, and assist engineers in debugging through user-friendly notebook-like interfaces.
We are looking to hire software engineers at all levels of our stack—be it infrastructure, backend, or product. You will be part of a dynamic, resourceful team that develops both foundational infrastructure and innovative internal tools, ensuring the reliability, performance, and observability of OpenAI's production systems.
What You’ll Do
Lead the development of core observability infrastructure, focusing on distributed logging, time series, and trace storage.
Create AI-integrated tools that empower engineers to autonomously identify, comprehend, and resolve issues.
Enhance user interface experiences including dashboards, notebooking, and interactive debugging.
Work collaboratively with engineers, researchers, user operations, and various teams to craft the next generation of the observability product.
You Might Be a Fit If You:
Have experience operating large-scale distributed systems in production, particularly logging systems or time series databases.
Excel in ambiguous environments and tackle unscoped challenges head-on.
Possess full-stack development skills or a strong product sensibility; you are eager to build practical tools that users will engage with.
Demonstrate robust knowledge of systems, networking, and cloud infrastructure (Kubernetes, AWS, etc.).
Bonus: Have built or contributed to observability systems (e.g., Prometheus, OpenTelemetry, etc.).
Why This Team?
We combine infrastructure and product development to create real AI applications for in-house use.
Your contributions will directly enhance the reliability of GPT-based products at OpenAI.

