Hark logoHark logo

Technical Staff Member - Multimodal Vision

HarkSan Jose New
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

Key ResponsibilitiesLead research and development to enhance vision and video capabilities within multimodal models, focusing on image understanding, video modeling, and generative vision systems. Create and refine large-scale vision and video data pipelines, covering data collection, filtering, labeling, and synthetic data generation. Design and implement cutting-edge models for vision and video, incorporating multimodal architectures that integrate visual data with text and other modalities. Establish evaluation frameworks and internal benchmarks to assess model performance, robustness, and visual quality across diverse tasks. Optimize models and systems for scalability, efficiency, and real-time or production deployments. Engage closely with product and engineering teams to transform research breakthroughs into impactful AI experiences for users. QualificationsDemonstrated experience in advancing vision or video models through innovative approaches in data, modeling, or training. Proficiency in image understanding and video processing, with a solid understanding of machine learning frameworks and algorithms. Strong problem-solving skills and the ability to work effectively in a collaborative, fast-paced environment. Excellent communication skills, both verbal and written, for effective collaboration across teams.

About the job

Join Hark: Pioneering AI Innovation

At Hark, we are at the forefront of artificial intelligence, crafting advanced, personalized intelligent systems that proactively engage with the world through speech, text, vision, and persistent memory.

We are revolutionizing the human-machine interaction landscape by integrating our sophisticated intelligence with next-gen hardware, setting the stage for a universal interface that transcends outdated chatboxes. Our mission is to develop agentic systems capable of natural interactions with both individuals and the environment.

Our focus is on creating multimodal models and cutting-edge AI hardware, built from the ground up as a cohesive interface for a transformative era of intelligent systems.

Role Overview

As a valuable member of the Omni team at Hark, you will be instrumental in shaping the future of AI experiences beyond mere text. Our aim is to enable models to comprehend and create content across various modalities, including text and vision, leading to seamless, real-time multimodal intelligence that drives intuitive and immersive user experiences.

Your contributions will span the entire stack, from data and modeling to training, serving, and product integration. You will play an essential role in both pretraining and post-training initiatives, collaborating closely with product teams to expand the capabilities of our models and deliver outstanding end-to-end user experiences.

About Hark

Hark is a groundbreaking artificial intelligence company dedicated to creating advanced, personalized intelligence systems that proactively interact with users through various modalities. We are revolutionizing the way humans and machines communicate, with a focus on developing sophisticated AI models and hardware that provide seamless, intuitive experiences.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.