About the job
About the Position
We are on the lookout for a Machine Learning Engineer specialized in model distillation to assist us in developing compact, rapid, and efficient models while maintaining high-quality standards. This role will involve blending research with practical applications, transforming state-of-the-art methodologies into scalable systems.
This is a proactive role offering significant ownership: you will design distillation pipelines, conduct extensive experiments, and deliver models utilized in production environments.
Your Responsibilities
Design and implement knowledge distillation pipelines (including teacher-student, self-distillation, multi-teacher approaches, and more).
Convert large foundational models into more compact, efficient, and cost-effective models for inference.
Execute and scrutinize large-scale training experiments to assess quality, latency, and cost trade-offs.
Work closely with research teams to adapt innovative distillation concepts into production-ready code.
Enhance training and inference efficiency (memory usage, throughput, and latency).
Contribute to the development of internal tools, evaluation frameworks, and experiment tracking systems.
(Optional) Engage in contributing to open-source models, tools, or research.
Ideal Candidate Profile
Robust background in machine learning and deep learning.
Hands-on experience with model distillation techniques (including LLMs or other neural networks).
Strong grasp of training dynamics, loss functions, and optimization strategies.
Proficiency in PyTorch (or JAX) along with contemporary ML tools.
Comfortable conducting experiments across multi-GPU or distributed configurations.
Ability to critically evaluate model quality versus performance trade-offs.
Pragmatic outlook: you prioritize deployment over merely publishing research.
Preferred Qualifications
Experience in distilling large language models or sizable sequence models.
Familiarity with inference optimization techniques (like quantization, pruning, kernels, etc.).
Experience in evaluating language models.
Contributions to open-source projects or research publications.
Experience in early-stage product development.
