companyExa logo

Software Engineer, Distributed Data Systems

ExaSan Francisco, California
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Required QualificationsIn-depth knowledge of lakehouse architectures (Delta Lake, Iceberg, Hudi) and their appropriate applicationsProven experience in building and managing large-scale distributed data processing pipelinesHands-on expertise with streaming data systems such as Kafka or FlinkFamiliarity with production-scale tools like Ray, Spark, or ClickHouseA relentless commitment to reliability and crafting systems that do not require 3 AM wake-up callsPreferred QualificationsExperience with Lance or similar vector-native storage formatsBackground in GPU-accelerated data processing technologies (RAPIDS, cuDF)Example ProjectsDesign a lakehouse architecture capable of managing over 100 PB of web crawl dataDevelop streaming pipelines to process billions of documents daily for real-time indexingArchitect the data layer for our embedding training infrastructure utilizing RayExpand our ClickHouse deployment to efficiently handle analytical queries across vast amounts of search logs

About the job

At Exa, we are on a mission to create a cutting-edge search engine from the ground up, tailored specifically for AI applications. Our team is dedicated to developing large-scale infrastructure that efficiently crawls the internet, trains advanced embedding models for indexing, and constructs high-performance vector databases in Rust for optimized searching. We also manage a state-of-the-art $5M H200 GPU cluster that activates thousands of machines simultaneously.


As a Software Engineer specializing in Distributed Data Systems, you will be responsible for designing and implementing the data infrastructure that drives our operations—from crawling billions of web pages to training sophisticated embedding models and delivering real-time search functionalities. You will enjoy significant autonomy in creating systems capable of scaling to hundreds of petabytes. This is your opportunity to work on data pipelines at an unprecedented scale.


About Exa

Exa is pioneering a revolutionary search engine designed specifically for AI applications. Our commitment to leveraging advanced technology enables us to build a robust infrastructure capable of handling vast amounts of data efficiently.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.