About the job
At Databricks, we are dedicated to empowering data teams to tackle some of the world's greatest challenges — whether it's turning the next mode of transportation into reality or expediting medical innovations. We achieve this by developing and managing the premier data and AI infrastructure platform, enabling our clients to leverage profound data insights to enhance their operations. Established by engineers who are obsessed with customer satisfaction, we embrace every chance to address intricate technical hurdles, from designing cutting-edge UI/UX for data interaction to scaling our services across millions of virtual machines.
Our mission is to radically simplify the entire data lifecycle, from data ingestion to generative AI and everything in between. We are doing this across multiple cloud environments with a unified platform, currently serving over 10,000 customers, processing exabytes of data daily on over 15 million virtual machines, and expanding exponentially.
To realize this vision, we are constructing multi-cloud systems that span the entire data ecosystem, from query engines and vector databases to training pipelines and storage solutions. We also develop and maintain the tools, languages, and stacks that bind everything together. In essence, we cover the entire spectrum.
The scope we operate within and the challenges we address are vast, intricate, and profound. Our published work on Lakehouse, Delta Lake, and Photon underscores our commitment to this mission. We are seeking practitioners who are enthusiastic about collaborating with industry leaders to expand the boundaries of what is achievable for our clients. If you are driven by truth, data-oriented, and thrive on foundational principles, then Databricks is the place for you.
As a member of the Database Engine team, you will have the opportunity to design and implement innovative solutions that surpass existing state-of-the-art systems:
- Query compilation & optimization
- Distributed query execution and scheduling
- Vectorized engine execution
- Data security
- Resource management
- Transaction coordination
- Efficient storage structures (encoding, indexes)
- Automatic physical data optimization

