Qualifications
Responsibilities
Lead research and development efforts to enhance speech and audio capabilities in multimodal models, encompassing speech recognition, synthesis, and comprehension.
Create and refine large-scale speech and audio data pipelines, focusing on data collection, filtering, alignment, and synthetic data generation.
Design and implement advanced models for speech and audio, including end-to-end multimodal architectures and real-time systems.
Establish evaluation frameworks and internal benchmarks to assess speech quality, latency, robustness, and overall user experience.
Optimize models and systems for real-time performance, scalability, and deployment in production environments.
Work closely with product and engineering teams to translate research innovations into impactful, user-facing AI solutions.
Requirements
Demonstrated expertise in advancing speech or audio models through innovative data, modeling, or training approaches.
Extensive experience in speech/audio domains such as Automatic Speech Recognition (ASR), Text-to-Speech (TTS), speech-to-speech translation, or audio foundation models.
Proficient in large-scale machine learning systems and distributed training methodologies.
About the job
About Hark
Hark is at the forefront of artificial intelligence, creating cutting-edge, personalized intelligence systems that are proactive and multimodal. Our technology interacts naturally with the world through speech, text, visual input, and persistent memory.
We are integrating this intelligence with next-generation hardware to establish a universal interface between humans and machines. While current AI primarily relies on outdated chat interfaces and devices, Hark is focused on pioneering the future: agentic systems capable of seamless interaction with individuals and their environments.
Our mission involves developing multimodal models alongside next-gen AI hardware, designed as a cohesive interface for a new era of intelligent systems.
About the Role
As a vital member of Hark's Omni team, you will contribute to the development of innovative AI experiences that transcend text, enabling models to comprehend and produce content across various modalities, including audio. Our objective is to forge real-time multimodal intelligence that facilitates intuitive and immersive user experiences.
Your role will entail advancing speech and audio functionalities within multimodal foundation models. You will engage in comprehensive tasks, from data and modeling to training, evaluation, and real-time deployment, pushing the frontiers of speech intelligence and enhancing human-computer interaction.
About Hark
Hark is a pioneering artificial intelligence company dedicated to developing sophisticated, personalized intelligence solutions. Our focus on multimodal communication and next-generation hardware positions us as leaders in creating a seamless human-machine interface.