About the job
P-188
At Databricks, we are on a mission to fundamentally transform the data lifecycle, simplifying processes from data ingestion to ETL, Business Intelligence (BI), and extending to Machine Learning (ML) and Artificial Intelligence (AI) through a unified platform. We envision a future where traditional data warehouse architectures are replaced by innovative solutions like the Lakehouse architecture (CIDR 2021 paper), which integrate data warehousing and advanced analytics, effectively addressing critical challenges such as data staleness, reliability, cost of ownership, data lock-in, and limited use-case support.
To turn this vision into reality, we are developing a next-generation decoupled query engine and structured storage system designed to surpass specialized data warehouses in relational query performance while maintaining the versatility of general-purpose systems like Apache Spark™. This system is intended to support a wide array of workloads, from ETL processes to data science applications.
As a member of our team, you will engage in one or more of the following areas, contributing to the design and implementation of cutting-edge systems that redefine industry standards:
- Query compilation and optimization
- Distributed query execution and scheduling
- Vectorized execution engine
- Data security measures
- Resource management strategies
- Transaction coordination processes
- Efficient storage structures (encodings, indexes)
- Automatic physical data optimization techniques

