About the job
About Our Team
At OpenAI, our Foundations team is dedicated to examining how model behavior evolves as we scale up models, data, and computing resources. We meticulously analyze the relationships between model architecture, optimization strategies, and training datasets to inform the design and training of next-generation models.
About the Position
As a Team Lead in Research Inference, you will be instrumental in constructing systems that empower advanced AI models to operate efficiently at scale. Your role lies at the crossroads of model research and systems engineering, where you will translate innovative architectural concepts into high-performance inference systems, clearly illustrating the trade-offs in performance, memory usage, and scalability.
Your contributions will significantly shape model design, evaluation, and iteration processes across our research organization. By developing and refining high-performance inference infrastructures, you will provide researchers with the tools necessary to explore new ideas while understanding their computational and systems implications.
This position does not involve serving products; instead, it supports research through a focus on performance, accuracy, and realism, ensuring that our AI research is firmly rooted in scalable solutions.
Responsibilities
Design and develop optimized inference runtimes for large-scale AI models, emphasizing efficiency, reliability, and scalability.
Take ownership of optimizing core execution processes, including model execution, memory management, batching, and scheduling.
Enhance and expand distributed inference across multiple GPUs, focusing on parallelism, communication patterns, and runtime coordination.
Implement and refine critical inference operators and kernels based on real-world workloads.
Collaborate closely with research teams to ensure accurate and efficient support for new model architectures within inference systems.
Identify and resolve performance bottlenecks through comprehensive profiling, benchmarking, and low-level debugging.
Contribute to the observability, correctness, and reliability of large-scale AI systems.
Ideal Candidate Profile
Experience in developing production-level inference systems, beyond just training and executing models.
Proficient in GPU-centric performance engineering, including managing memory behavior and understanding latency/throughput trade-offs.
Strong analytical skills and familiarity with performance profiling tools.
