companyOpenAI logo

Agentic Engineering Manager - Stargate

OpenAISan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Manager

Qualifications

Ideal candidates will have a solid background in systems engineering, with proven experience in leading engineering teams. A strong understanding of agentic systems and their application in infrastructure workflows is essential. Candidates should possess excellent problem-solving skills and the ability to work collaboratively across various teams.

About the job

About Our Team

The Stargate Infrastructure team at OpenAI is at the forefront of developing and managing systems that support cutting-edge AI workloads at an unprecedented scale. Our mission encompasses the deployment and management of clusters, networks, and data center infrastructure across both first-party and partner environments.

As the complexity and scale of our systems expand, we are making significant investments in agentic systems and intelligent automation, aimed at optimizing infrastructure deployment, operation, and debugging processes. Our focus is on leveraging AI-driven methodologies to enhance real-world infrastructure workflows, leading to accelerated execution, improved reliability, and scalable operations.

About the Position

We are looking for an IC Agentic Engineering Manager to spearhead the development and implementation of agent-based systems for infrastructure delivery and operations within our Stargate team.

In this player-coach role, you will not only lead a small team but also engage directly in the design and implementation of systems. You will concentrate on integrating agentic systems into infrastructure workflows, including deployment orchestration, system initialization, issue triage, debugging, and capacity management.

This role is distinctly focused on applying agentic systems to address specific infrastructure challenges, collaborating closely with hardware, networking, and clustering teams.

Key Responsibilities

  • Architect and construct agent-based systems that facilitate infrastructure deployment and operations.

  • Identify high-impact opportunities for agent application across workflows, including:

    • Cluster initialization and deployment readiness.

    • Incident triage and root cause analysis.

    • System validation and health monitoring.

    • Capacity management and operational decision-making.

  • Lead a small team while also contributing as an IC in the areas of system design, development, and integration.

  • Collaborate with infrastructure, hardware, and networking teams to incorporate agentic systems into production workflows.

  • Develop systems that utilize telemetry, logs, and system signals to enable closed-loop automation.

  • Establish evaluation frameworks to assess system performance, reliability, and operational impact.

  • Drive the transition from prototype to production, ensuring robustness and scalability.

About OpenAI

OpenAI is a leading research organization dedicated to developing artificial intelligence in a manner that is safe and beneficial to humanity. Our teams are composed of experts in various fields who are committed to pushing the boundaries of AI technology.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.