About the job
At Databricks, we are driven by our commitment to empower data teams in tackling the world's most challenging problems — from transforming transportation solutions to accelerating medical advancements. Our mission revolves around constructing and maintaining the world's premier data and AI infrastructure platform, enabling our clients to harness deep data insights for enhanced business outcomes.
Foundation Model Serving represents the API product designed for hosting and serving advanced AI model inference, catering to both open-source models like Llama, Qwen, and GPT OSS, as well as proprietary models such as Claude and OpenAI GPT. We welcome engineers who have experience managing high-scale operational systems, including customer-facing APIs, Edge Gateways, or ML Inference services, even if they do not have a background in ML or AI. A passion for developing LLM APIs and runtimes at scale is essential.
As a Staff Engineer, you will play a pivotal role in defining both the product experience and the underlying infrastructure. You will be tasked with designing and building systems that facilitate high-throughput, low-latency inference on GPU workloads with cutting-edge models. Your influence will extend to architectural direction, working closely with platform, product, infrastructure, and research teams to deliver an exceptional foundation model API product.
The impact you will have:
- Design and implement core systems and APIs that drive Databricks Foundation Model Serving, ensuring scalability, reliability, and operational excellence.
- Collaborate with product and engineering leaders to outline the technical roadmap and long-term architecture for workload serving.
- Make architectural decisions to enhance performance, throughput, autoscaling, and operational efficiency for GPU serving workloads.
- Contribute directly to critical components within the serving infrastructure, from systems like vLLM and SGLang to developing token-based rate limiters and optimizers, ensuring seamless and efficient operations at scale.
- Work cross-functionally with product, platform, and research teams to transform customer requirements into dependable and high-performing systems.
- Establish best practices for code quality, testing, and operational readiness while mentoring fellow engineers through design reviews and technical support.
- Represent the team in inter-departmental technical discussions, influencing Databricks’ wider AI platform strategy.

