About the job
About Hark
Hark is at the forefront of artificial intelligence, creating cutting-edge, personalized intelligence systems that are proactive and multimodal. Our technology interacts naturally with the world through speech, text, visual input, and persistent memory.
We are integrating this intelligence with next-generation hardware to establish a universal interface between humans and machines. While current AI primarily relies on outdated chat interfaces and devices, Hark is focused on pioneering the future: agentic systems capable of seamless interaction with individuals and their environments.
Our mission involves developing multimodal models alongside next-gen AI hardware, designed as a cohesive interface for a new era of intelligent systems.
About the Role
As a vital member of Hark's Omni team, you will contribute to the development of innovative AI experiences that transcend text, enabling models to comprehend and produce content across various modalities, including audio. Our objective is to forge real-time multimodal intelligence that facilitates intuitive and immersive user experiences.
Your role will entail advancing speech and audio functionalities within multimodal foundation models. You will engage in comprehensive tasks, from data and modeling to training, evaluation, and real-time deployment, pushing the frontiers of speech intelligence and enhancing human-computer interaction.
