ML Model Serving Engineer

SesameSan Francisco

On-site Full-Time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Qualifications

Required Qualifications:Proficient in a differentiable array computing framework, with a strong preference for PyTorch.Expertise in optimizing machine learning models for high-throughput and low-latency serving.Extensive systems programming experience, including familiarity with high-performance server systems and comfort with both VLLM internals and complex PyTorch codebases.Significant performance engineering background, including experience in bottleneck analysis and profiling low-level systems code.Committed to staying current with the latest advancements in model serving optimization.Preferred Qualifications:Understanding of high-performance LLM serving, with experience in VLLM, SGlang deployment, and their internals.Experience with public cloud platforms such as GCP, AWS, or Azure.Experience in deploying and scaling infrastructure for machine learning services.

About the job

About Sesame

At Sesame, we envision a transformative future where technology is seamlessly integrated into our lives, enabling computers to perceive, interact, and collaborate in ways that feel genuinely human. Our mission is to create innovative voice agents that become an integral part of daily experiences. Our talented team comprises pioneers from Oculus and Ubiquity6, alongside industry leaders from Meta, Google, and Apple, all bringing extensive expertise in both hardware and software. Join us in pioneering a world where computers are truly alive.

Key Responsibilities:

Enhance our model serving infrastructure, integrating a diverse range of LLM, speech, and vision models.
Collaborate with ML infrastructure and training engineers to develop a fast, cost-efficient, and reliable serving layer for our groundbreaking consumer product.
Adapt and extend existing LLM serving frameworks such as VLLM and SGLang, leveraging cutting-edge techniques for high-performance model serving.
Partner with the training team to uncover opportunities for accelerating model performance without compromising quality.
Implement strategies like in-flight batching, caching, and custom kernels to optimize inference speed.
Discover methods to minimize model initialization times while maintaining excellence in quality.

About Sesame

Sesame is at the forefront of redefining human-computer interaction. With a pioneering spirit and an innovative team, we are determined to build the next generation of voice agents that will seamlessly integrate into everyday life, making technology more intuitive and accessible. Join us in this exciting journey to shape a future where computers are more than just tools—they are partners in our daily experiences.

1 - 20 of 1,069,198 Jobs

Select all on this page (20)

Create account — see all 1,069,198 results

ML Model Serving Engineer

SesameSan Francisco

On-site Full-Time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Qualifications

About the job

About Sesame

Key Responsibilities:

Enhance our model serving infrastructure, integrating a diverse range of LLM, speech, and vision models.
Collaborate with ML infrastructure and training engineers to develop a fast, cost-efficient, and reliable serving layer for our groundbreaking consumer product.
Adapt and extend existing LLM serving frameworks such as VLLM and SGLang, leveraging cutting-edge techniques for high-performance model serving.
Partner with the training team to uncover opportunities for accelerating model performance without compromising quality.
Implement strategies like in-flight batching, caching, and custom kernels to optimize inference speed.
Discover methods to minimize model initialization times while maintaining excellence in quality.

ML Model Serving Engineer

Unlock Your Potential

Qualifications

About the job

Key Responsibilities:

About Sesame

Warehouse Manager Sales Assistant at JYSK Svendborg

Tech Manager

Director of Learning & Development

Customer Success Architect - Public Sector

Junior CRM and Systems Analyst

Purchasing Specialist - Motor

Senior Packaging Purchasing Specialist

SAP Functional Consultant

SAP ABAP Developer at Accenture Federal Services | Washington, DC

Customer Service Cashier at Pilot Company | Yucca

Chef.fe de Service Entretien

Senior Systems Engineer - Foreign Object Detection

Service Food Manager

Associate - Architectural Design

Chef.fe de Service Alimentaire

Quantity Surveyor at AECOM | Singapore

IT Security Consultant (Cyber Access Management, IAM, PAM, CyberArk)

Global Client Executive - LinkedIn Talent Solutions

Casual Retail Merchandiser

Lead Service Delivery Manager

ML Model Serving Engineer

Unlock Your Potential

Qualifications

About the job

Key Responsibilities:

About Sesame

ML Model Serving Engineer

Unlock Your Potential

Qualifications

About the job

Key Responsibilities:

About Sesame

ML Model Serving Engineer

Unlock Your Potential

Qualifications

About the job

Key Responsibilities:

About Sesame