companyOpenAI logo

Team Lead, Research Inference

OpenAISan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

QualificationsProven experience in building and optimizing production inference systems. Strong background in GPU performance engineering and resource optimization. Ability to diagnose and rectify performance issues in complex systems.

About the job

About Our Team

At OpenAI, our Foundations team is dedicated to examining how model behavior evolves as we scale up models, data, and computing resources. We meticulously analyze the relationships between model architecture, optimization strategies, and training datasets to inform the design and training of next-generation models.

About the Position

As a Team Lead in Research Inference, you will be instrumental in constructing systems that empower advanced AI models to operate efficiently at scale. Your role lies at the crossroads of model research and systems engineering, where you will translate innovative architectural concepts into high-performance inference systems, clearly illustrating the trade-offs in performance, memory usage, and scalability.

Your contributions will significantly shape model design, evaluation, and iteration processes across our research organization. By developing and refining high-performance inference infrastructures, you will provide researchers with the tools necessary to explore new ideas while understanding their computational and systems implications.

This position does not involve serving products; instead, it supports research through a focus on performance, accuracy, and realism, ensuring that our AI research is firmly rooted in scalable solutions.

Responsibilities

  • Design and develop optimized inference runtimes for large-scale AI models, emphasizing efficiency, reliability, and scalability.

  • Take ownership of optimizing core execution processes, including model execution, memory management, batching, and scheduling.

  • Enhance and expand distributed inference across multiple GPUs, focusing on parallelism, communication patterns, and runtime coordination.

  • Implement and refine critical inference operators and kernels based on real-world workloads.

  • Collaborate closely with research teams to ensure accurate and efficient support for new model architectures within inference systems.

  • Identify and resolve performance bottlenecks through comprehensive profiling, benchmarking, and low-level debugging.

  • Contribute to the observability, correctness, and reliability of large-scale AI systems.

Ideal Candidate Profile

  • Experience in developing production-level inference systems, beyond just training and executing models.

  • Proficient in GPU-centric performance engineering, including managing memory behavior and understanding latency/throughput trade-offs.

  • Strong analytical skills and familiarity with performance profiling tools.

About OpenAI

OpenAI is at the forefront of AI research and deployment, focused on ensuring that artificial general intelligence (AGI) benefits all of humanity. Our teams strive for safety, transparency, and collaboration in AI development.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.