About the job
About Hark
Hark is at the forefront of artificial intelligence innovation, dedicated to creating advanced, personalized intelligence that is proactive, multimodal, and able to engage with the world through speech, text, vision, and persistent memory.
We are combining this intelligence with cutting-edge hardware to establish a universal interface between humans and machines. While existing AI typically operates through chat boxes and outdated devices, Hark is pioneering the future: intelligent systems that naturally interact with people and their environments.
To achieve this, we are developing multimodal models and state-of-the-art AI hardware, designed from the ground up as a cohesive interface for a new era of intelligent systems.
About the Role
The Omni team at Hark is creating the next generation of AI experiences that extend beyond text, enabling models to comprehend and generate content across diverse modalities, including text, audio, and vision. Our mission is to develop seamless, real-time multimodal intelligence that enhances intuitive and immersive user experiences.
As a member of the Omni team, you will play a critical role in advancing real-time audio, video, and multimodal models. This position encompasses full-stack development—from data and modeling to training, serving, and product integration. You will contribute to both pretraining and posttraining initiatives while collaborating closely with product teams to push the limits of model capabilities and deliver outstanding end-to-end user experiences.

