companyOpenAI logo

Software Engineer, Observability

OpenAISan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

To excel in this role, you should have:Proven experience with large-scale distributed systems, particularly in logging or time series databases. Strong problem-solving skills in ambiguous environments. Full-stack development expertise or a keen interest in product development. Solid foundation in systems, networking, and cloud infrastructure technologies. Experience with observability tools is a plus.

About the job

Become part of the innovative engineering teams at OpenAI, where we create and deliver groundbreaking AI technologies responsibly and safely to the world!

Our Applied Engineering team collaborates across research, engineering, product, and design disciplines to deploy OpenAI's cutting-edge technology for both consumers and businesses. We are committed to learning from our deployments and ensuring that AI is utilized ethically while maximizing its benefits. To us, safety takes precedence over unchecked growth.

About the Role

We are in the process of developing OpenAI's observability product, which encompasses everything from scalable infrastructure to an intuitive, AI-enhanced user interface. Our systems process petabytes of logs and billions of time series metrics throughout our infrastructure. We are now integrating intelligence to create features like agents that summarize service events, auto-generate dashboards, and assist engineers in debugging through user-friendly notebook-like interfaces.

We are looking to hire software engineers at all levels of our stack—be it infrastructure, backend, or product. You will be part of a dynamic, resourceful team that develops both foundational infrastructure and innovative internal tools, ensuring the reliability, performance, and observability of OpenAI's production systems.

What You’ll Do

  • Lead the development of core observability infrastructure, focusing on distributed logging, time series, and trace storage.

  • Create AI-integrated tools that empower engineers to autonomously identify, comprehend, and resolve issues.

  • Enhance user interface experiences including dashboards, notebooking, and interactive debugging.

  • Work collaboratively with engineers, researchers, user operations, and various teams to craft the next generation of the observability product.

You Might Be a Fit If You:

  • Have experience operating large-scale distributed systems in production, particularly logging systems or time series databases.

  • Excel in ambiguous environments and tackle unscoped challenges head-on.

  • Possess full-stack development skills or a strong product sensibility; you are eager to build practical tools that users will engage with.

  • Demonstrate robust knowledge of systems, networking, and cloud infrastructure (Kubernetes, AWS, etc.).

  • Bonus: Have built or contributed to observability systems (e.g., Prometheus, OpenTelemetry, etc.).

Why This Team?

  • We combine infrastructure and product development to create real AI applications for in-house use.

  • Your contributions will directly enhance the reliability of GPT-based products at OpenAI.

About OpenAI

OpenAI is at the forefront of artificial intelligence research and deployment, committed to ensuring that AI technology is developed in a safe and responsible manner. Our mission is to leverage AI to benefit humanity while prioritizing ethical considerations in its application.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.