About the job
Who We Are
At Twelve Labs, we are at the forefront of developing state-of-the-art multimodal foundation models that enable video comprehension akin to human understanding. Our innovative models have set new benchmarks in video-language modeling, enhancing our capabilities and revolutionizing how we engage with and analyze diverse forms of media.
Securing over $110 million in Seed and Series A funding, we are supported by prestigious venture capital firms including NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, along with leading AI pioneers like Fei-Fei Li, Silvio Savarese, and Alexandr Wang. With our headquarters in San Francisco and a strong presence in Seoul, we are committed to fostering global innovation.
Our collaboration with NVIDIA and AWS equips us with cutting-edge chips, including B300s, allowing us to expand the horizons of video AI technology.
We embrace the uniqueness of each individual's journey, believing that our diverse cultural, educational, and life experiences enhance our ability to challenge conventional thinking. We seek motivated individuals who are passionate about our mission and eager to make a significant impact as we advance technology to transform the world. Join us in revolutionizing video understanding and multimodal AI.
About the Team
The Pegasus team is pivotal to Twelve Labs' video understanding services, spearheading the development of Pegasus, our Video Analysis product. We focus on creating multimodal video analysis systems that excel in instruction-following capabilities and generate complex, hierarchically structured outputs. Our emphasis is on delivering products with tangible real-world value, working within a goal-oriented, cross-functional team comprising both machine learning researchers and engineers.
Our work addresses a wide array of challenges, including large-scale distributed training of multimodal LLMs from pre-training to reinforcement learning, precise temporal segmentation, and structured metadata extraction for practical applications. We also enhance temporal context length and refine data curation processes to align evaluation and performance improvements through enhanced training data.
Our team has access to the latest advanced chips, such as NVIDIA B300s, which accelerates our transition from research to production at an unprecedented pace.
