Qualifications
Key ResponsibilitiesIdentify and implement optimal training strategies, including parallelism approaches and precision trade-offs for diverse model sizes and computational loads. Profile, debug, and optimize single and multi-GPU operations utilizing tools like Nsight and stack trace viewers to gain insights at the hardware level. Conduct comprehensive analysis and enhancement of the entire training pipeline, focusing on efficient data storage, loading, distributed training, checkpoint saving, and logging. Establish scalable systems for experiment tracking, data/model versioning, and deriving experiment insights. Design, deploy, and maintain large-scale ML training clusters utilizing SLURM for distributed workload orchestration. Ideal Candidate ProfileProven experience in optimizing training and inference workloads through hands-on implementation of the latest techniques. Strong understanding of GPU memory hierarchy and computational capabilities, with insights into hardware limitations. Experience in optimizing both memory-bound and compute-bound operations, with clarity on when each constraint is critical. Expertise in efficient attention algorithms and their performance characteristics.
About the job
Join Mirelo AI, where we are pioneering the future of creative tools by transforming silent video content into immersive sound, speech, and music.
Our team is at the forefront of developing advanced generative AI models that bring life to video content, enabling creators across gaming and video platforms to enhance their storytelling. Recently, we secured a strong $41 million Seed funding round, led by prestigious firms including Andreessen Horowitz and Index Ventures, propelling our rapid expansion in Product, Engineering, Go-to-Market, and Growth.
About the Role
As a Training Infrastructure Engineer, you will play a crucial role in optimizing our training stack. Your responsibilities will include profiling GPU behavior, debugging training pipelines, enhancing throughput, selecting optimal parallelism strategies, and building robust infrastructure for efficient model training at scale. You will collaborate on cluster management, model training, and the development of efficient data pipelines for video and audio processing.
About Mirelo AI
Mirelo AI is a trailblazer in the field of generative AI, dedicated to enhancing creative expression through innovative technology. Our mission is to empower storytellers worldwide by providing tools that turn silent videos into vibrant audio experiences, creating captivating narratives.