Hark logoHark logo

Lead Data Engineer

HarkSan Jose New
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Mid to Senior

Qualifications

ResponsibilitiesDesign and implement scalable data pipelines capable of ingesting, processing, and delivering training data across various modalities: text, audio, vision, and structured feedback signals. Take ownership of the complete data infrastructure stack, encompassing ingestion, transformation, deduplication, quality filtering, versioning, and delivery to model training and evaluation systems. Collaborate closely with model researchers and data collection leads to understand data needs and convert them into reliable, auditable pipelines. Develop tools and frameworks that facilitate data inspection, evaluation, and iterative enhancements in data quality, informing collection and curation strategies. Establish and uphold data quality standards by instrumenting pipelines for accuracy, freshness, and coverage, ensuring regressions are identified prior to training. Create data systems designed for reproducibility and scalability, ensuring the pipelines can manage increasing volumes across modalities without becoming bottlenecks. Pinpoint weaknesses in the current stack and lead initiatives to enhance throughput, quality, and reliability.

About the job

About Hark

Hark is at the forefront of artificial intelligence, dedicated to crafting sophisticated, personalized intelligence systems. Our innovative approach merges proactive, multimodal capabilities, enabling interaction through speech, text, vision, and enduring memory.

We are revolutionizing the interface between humans and machines by integrating our intelligence with cutting-edge hardware. While traditional AI applications are limited to chat boxes and outdated devices, Hark is pioneering the next generation of agentic systems that communicate seamlessly with people and their environments.

Our mission involves the development of multimodal models and next-generation AI hardware, all designed from the ground up as a cohesive interface for a new era of intelligent systems.

About the Role

As the Lead Data Engineer, you will establish the data infrastructure that transforms raw signals into the training datasets essential for Hark's models. You will be responsible for building and maintaining the pipelines that ensure a smooth flow of data at scale.

This role encompasses the entire data engineering stack: from ingestion and transformation to quality filtering and delivery to training and evaluation systems. The efficacy of the models we deploy is directly linked to the quality of the underlying data, and this position is pivotal in owning that foundational aspect.

This high-responsibility role requires collaboration with a small, agile team. You will work closely with model researchers, data collection leads, and infrastructure engineers, where the systems you design will significantly influence the quality and speed of model development.

About Hark

Hark is an innovative artificial intelligence firm focused on developing advanced, personalized intelligence systems that utilize proactive, multimodal capabilities. By integrating groundbreaking AI with next-generation hardware, Hark is redefining the interaction between humans and machines, advancing toward systems that communicate naturally and effectively with their environment.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.