About the job
About HUD
HUD is at the forefront of creating cutting-edge infrastructure for generating reinforcement learning (RL) training data and evaluations for advanced AI agents. We are also developing a marketplace dedicated to connecting frontier labs with high-quality training data. Our innovative platform is trusted by leading frontier labs, Fortune 500 companies, and dynamic startups. With $15M raised from top venture capitalists, we are proud to be a part of Y Combinator's W25 cohort.
About the Role
We are seeking talented Research Engineers to enhance our quality assurance processes for training data generated by our partner companies. In this role, you will be instrumental in developing systems that ensure quality at scale, allowing us to meet our growing demand.
Key Responsibilities
Establish and uphold quality standards for training datasets.
Develop tools and workflows for auditing datasets produced by suppliers, including sampling strategies, validation pipelines (both rule-based and model-assisted), and feedback mechanisms.
Evaluate the effectiveness of human-in-the-loop review processes to enhance quality assurance.
Collaborate with data vendors to troubleshoot quality challenges, provide actionable insights, and enhance their data generation methodologies.
Integrate QA insights into our infrastructure tools and data vendor portal to minimize anomalies, inconsistencies, and edge cases.
Qualifications
You might be a great fit if you possess:
Expertise in Python, Docker, and Linux systems.
Experience handling large-scale datasets.
Demonstrated ability to learn quickly and adapt in technical settings (e.g., participation in programming competitions).
Experience in early-stage tech startups with the capacity to work autonomously in fast-paced environments.
Familiarity with contemporary AI tools and capabilities of large language models (LLMs).
Strong communication skills for effective remote collaboration across different time zones.
Ideal candidates may also:
Understand common pitfalls in training data.
Have experience in developing data validation pipelines and/or human-in-the-loop review systems.
Exhibit attention to detail, capable of identifying subtle inconsistencies or edge cases in data.
Be comfortable not just executing but also designing metrics, experiments, and QA processes.

