companyDatabricks logo

Staff Software Engineer, Foundation Model Serving

DatabricksSan Francisco, California
On-site Full-time $192K/yr - $260K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

What we look for:10+ years of experience in building and operating large-scale distributed systems. Experience with customer-facing APIs, Edge Gateways, ML inference, or similar services. Strong interest in developing LLM APIs and runtimes at scale.

About the job

At Databricks, we are driven by our commitment to empower data teams in tackling the world's most challenging problems — from transforming transportation solutions to accelerating medical advancements. Our mission revolves around constructing and maintaining the world's premier data and AI infrastructure platform, enabling our clients to harness deep data insights for enhanced business outcomes.

Foundation Model Serving represents the API product designed for hosting and serving advanced AI model inference, catering to both open-source models like Llama, Qwen, and GPT OSS, as well as proprietary models such as Claude and OpenAI GPT. We welcome engineers who have experience managing high-scale operational systems, including customer-facing APIs, Edge Gateways, or ML Inference services, even if they do not have a background in ML or AI. A passion for developing LLM APIs and runtimes at scale is essential.

As a Staff Engineer, you will play a pivotal role in defining both the product experience and the underlying infrastructure. You will be tasked with designing and building systems that facilitate high-throughput, low-latency inference on GPU workloads with cutting-edge models. Your influence will extend to architectural direction, working closely with platform, product, infrastructure, and research teams to deliver an exceptional foundation model API product.

The impact you will have:

  • Design and implement core systems and APIs that drive Databricks Foundation Model Serving, ensuring scalability, reliability, and operational excellence.
  • Collaborate with product and engineering leaders to outline the technical roadmap and long-term architecture for workload serving.
  • Make architectural decisions to enhance performance, throughput, autoscaling, and operational efficiency for GPU serving workloads.
  • Contribute directly to critical components within the serving infrastructure, from systems like vLLM and SGLang to developing token-based rate limiters and optimizers, ensuring seamless and efficient operations at scale.
  • Work cross-functionally with product, platform, and research teams to transform customer requirements into dependable and high-performing systems.
  • Establish best practices for code quality, testing, and operational readiness while mentoring fellow engineers through design reviews and technical support.
  • Represent the team in inter-departmental technical discussions, influencing Databricks’ wider AI platform strategy.

About Databricks

Databricks is at the forefront of data and AI innovation, dedicated to creating solutions that empower organizations to tackle complex challenges through advanced technology. Our team thrives on collaboration and is committed to delivering groundbreaking insights that drive success for our clients.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.