About the job
P-188
At Databricks, we are on a mission to revolutionize the data lifecycle—from ingestion and ETL to business intelligence and advanced machine learning. Our vision is centered around a unified platform that replaces the conventional data warehouse architecture with a cutting-edge Lakehouse model (CIDR 2021 paper). This innovative architecture aims to tackle significant challenges such as data staleness, reliability, total cost of ownership, data lock-in, and limited support for diverse use cases.
A pivotal element of achieving this vision is the development of the next generation of decoupled query engines and structured storage systems that can surpass the performance of specialized data warehouses while retaining the versatility of general-purpose systems like Apache Spark™. This capability is essential for supporting a wide range of workloads, from ETL processes to complex data science applications.
As a key member of this team, you will engage in one or more of the following areas to design and implement systems that set new standards in the industry:
- Query compilation and optimization
- Distributed query execution and scheduling
- Vectorized execution engine
- Data security
- Resource management
- Transaction coordination
- Efficient storage structures (encodings, indexes)
- Automatic physical data optimization

