About the job
About Our Team
At OpenAI, our Data Platform team is at the heart of our innovative approaches to data management, powering essential product, research, and analytics workflows. We manage some of the largest Spark compute fleets in production, architect data lakes and metadata systems on Iceberg and Delta, and envision exabyte-scale architectures. Our high-throughput streaming platforms utilize Kafka and Flink, while our orchestration is powered by Airflow. We also support machine learning feature engineering tools such as Chronon. Our mission is to provide secure, reliable, and efficient data access at scale, thereby enhancing intelligent, AI-assisted data workflows.
Join us in building and maintaining these core platforms that are foundational to OpenAI's products, research, and analytics capabilities.
We are not just scaling infrastructure; we are transforming the way people engage with data. Our vision includes intelligent interfaces and AI-powered workflows that make data interactions faster, more reliable, and intuitive.
About the Position
In this role, you will focus on constructing and managing data infrastructure that supports extensive compute fleets and storage systems optimized for high performance and scalability. You will be instrumental in designing, developing, and operating the next generation of data infrastructure at OpenAI. Your responsibilities will encompass scaling and securing big data compute and storage platforms, building and maintaining high-throughput streaming systems, ensuring low-latency data ingestion, and facilitating secure, governed data access for machine learning and analytics. You will also prioritize reliability and performance at extreme scales.
You will have complete ownership of the full lifecycle: from architecture to implementation, production operations, and on-call responsibilities.
You should be experienced with platforms such as Spark, Kafka, Flink, Airflow, Trino, or Iceberg. Familiarity with infrastructure tools like Terraform, along with expertise in debugging large-scale distributed systems, is essential. A passion for addressing data infrastructure challenges in the AI domain is a must.
This role is based in San Francisco, CA. We offer a hybrid work model requiring 3 days in the office each week and provide relocation assistance for new hires.
Responsibilities:
Design, build, and maintain data infrastructure systems including distributed compute, data orchestration, distributed storage, streaming infrastructure, and machine learning infrastructure, ensuring they are scalable, reliable, and secure.
Ensure our data platform can scale significantly while maintaining reliability and efficiency.
Enhance company productivity by empowering your fellow engineers and teammates through innovative data solutions.

