companyOpenAI logo

Software Engineer, Data Infrastructure

OpenAISan Francisco
Hybrid Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Proven experience with data infrastructure platforms such as Spark, Kafka, Flink, Airflow, Trino, or Iceberg. Proficiency in infrastructure tooling like Terraform. Strong debugging skills for large-scale distributed systems. A genuine enthusiasm for tackling data infrastructure challenges in the AI sector. Exceptional problem-solving abilities and a collaborative spirit.

About the job

About Our Team

At OpenAI, our Data Platform team is at the heart of our innovative approaches to data management, powering essential product, research, and analytics workflows. We manage some of the largest Spark compute fleets in production, architect data lakes and metadata systems on Iceberg and Delta, and envision exabyte-scale architectures. Our high-throughput streaming platforms utilize Kafka and Flink, while our orchestration is powered by Airflow. We also support machine learning feature engineering tools such as Chronon. Our mission is to provide secure, reliable, and efficient data access at scale, thereby enhancing intelligent, AI-assisted data workflows.

Join us in building and maintaining these core platforms that are foundational to OpenAI's products, research, and analytics capabilities.

We are not just scaling infrastructure; we are transforming the way people engage with data. Our vision includes intelligent interfaces and AI-powered workflows that make data interactions faster, more reliable, and intuitive.

About the Position

In this role, you will focus on constructing and managing data infrastructure that supports extensive compute fleets and storage systems optimized for high performance and scalability. You will be instrumental in designing, developing, and operating the next generation of data infrastructure at OpenAI. Your responsibilities will encompass scaling and securing big data compute and storage platforms, building and maintaining high-throughput streaming systems, ensuring low-latency data ingestion, and facilitating secure, governed data access for machine learning and analytics. You will also prioritize reliability and performance at extreme scales.

You will have complete ownership of the full lifecycle: from architecture to implementation, production operations, and on-call responsibilities.

You should be experienced with platforms such as Spark, Kafka, Flink, Airflow, Trino, or Iceberg. Familiarity with infrastructure tools like Terraform, along with expertise in debugging large-scale distributed systems, is essential. A passion for addressing data infrastructure challenges in the AI domain is a must.

This role is based in San Francisco, CA. We offer a hybrid work model requiring 3 days in the office each week and provide relocation assistance for new hires.

Responsibilities:

  • Design, build, and maintain data infrastructure systems including distributed compute, data orchestration, distributed storage, streaming infrastructure, and machine learning infrastructure, ensuring they are scalable, reliable, and secure.

  • Ensure our data platform can scale significantly while maintaining reliability and efficiency.

  • Enhance company productivity by empowering your fellow engineers and teammates through innovative data solutions.

About OpenAI

OpenAI is at the forefront of artificial intelligence research and deployment, dedicated to ensuring that advanced AI technology benefits all of humanity. Our commitment to innovation drives us to develop secure and scalable data solutions that redefine how data is approached in the AI landscape.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.