About the job
About Sesame
At Sesame, we envision a transformative future where technology is seamlessly integrated into our lives, enabling computers to perceive, interact, and collaborate in ways that feel genuinely human. Our mission is to create innovative voice agents that become an integral part of daily experiences. Our talented team comprises pioneers from Oculus and Ubiquity6, alongside industry leaders from Meta, Google, and Apple, all bringing extensive expertise in both hardware and software. Join us in pioneering a world where computers are truly alive.
Key Responsibilities:
Enhance our model serving infrastructure, integrating a diverse range of LLM, speech, and vision models.
Collaborate with ML infrastructure and training engineers to develop a fast, cost-efficient, and reliable serving layer for our groundbreaking consumer product.
Adapt and extend existing LLM serving frameworks such as VLLM and SGLang, leveraging cutting-edge techniques for high-performance model serving.
Partner with the training team to uncover opportunities for accelerating model performance without compromising quality.
Implement strategies like in-flight batching, caching, and custom kernels to optimize inference speed.
Discover methods to minimize model initialization times while maintaining excellence in quality.

