About the job
Who We Are
Lightricks is an innovative AI-focused company dedicated to developing state-of-the-art content creation technologies designed for businesses and studios. Our mission is to seamlessly connect imagination with creation. At the heart of our operations lies LTX-2, an open-source generative video model engineered to deliver stunning, high-fidelity video output at unparalleled speeds. It not only fuels our proprietary products but also supports a diverse ecosystem of partners via API integrations.
Globally recognized for revolutionizing consumer creativity, our flagship product, Facetune, has introduced AI-driven visual expression to countless users around the world. We pride ourselves on blending rigorous research, user-centric design, and comprehensive execution from concept to completion, making advanced expression accessible to everyone.
About the Role
In your role as a Large Scale Training Engineer, you will be instrumental in optimizing the training throughput of our internal framework, empowering researchers to explore and develop novel model concepts. This position requires exceptional engineering capabilities for designing, implementing, and refining cutting-edge AI models, alongside a strong foundation in writing robust machine learning code and a profound understanding of supercomputer performance. Your skills in performance optimization, expertise in distributed systems, and capability to troubleshoot will be vital, as our framework executes extensive computations across numerous virtual machines.
This opportunity is tailored for individuals who possess not only technical prowess but also a fervent passion for advancing AI and machine learning through groundbreaking engineering and collaborative research efforts.
Key Responsibilities
- Profile and streamline the training process to ensure optimal efficiency and effectiveness, including the enhancement of multimodal data pipelines and data storage techniques.
- Create high-performance TPU/GPU/CPU kernels and incorporate advanced methodologies into our training framework to maximize hardware utilization.
- Leverage your knowledge of hardware features to implement significant optimizations and provide guidance on hardware/software co-designs.
- Collaborate with researchers to develop model architectures that promote efficient training and inference.
- Design, maintain, and evolve a high-quality, shared codebase, emphasizing correctness, readability, extensibility, testing, and long-term maintainability, while balancing performance needs.

