About the job
About Hark
Hark is a cutting-edge artificial intelligence company focused on creating advanced, personalized intelligence systems. Our technology is proactive, multimodal, and designed to engage with the world through speech, text, vision, and persistent memory.
We are integrating this intelligence with next-generation hardware to establish a universal interface between humans and machines. While conventional AI relies heavily on chat interfaces and outdated devices, Hark is pioneering the future with agentic systems that communicate naturally with people and their environments.
To achieve this vision, we are developing multimodal models in conjunction with innovative AI hardware, creating a comprehensive interface for a new era of intelligent systems.
About the Role
We are in search of a Post-Training Member of Technical Staff to spearhead the development of strategies that define how our models enhance coding, computer utilization, and agentic capabilities on a large scale.
This position is at the cutting edge of a rapidly evolving field, where reinforcement learning, simulation, and large-scale model training intersect to create agents that can reason, plan, and execute tasks over extended periods. There is no established playbook; we seek researchers and engineers who can apply rigor and creativity from related domains, such as reinforcement learning, robotics, game systems, compiler tools, formal verification, or program synthesis, to advance the next generation of coding and agentic AI.
Responsibilities
- Design and implement RL-based post-training strategies to develop robust coding agents capable of multi-step reasoning and tool use.
- Construct and scale simulation environments for agentic reinforcement learning, including code execution sandboxes and verifiable reward systems.
- Create reward modeling pipelines, including various reward signals, and refine them based on training dynamics.
- Enhance synthetic data generation and trajectory distillation processes to improve sample efficiency in RL training.
- Conduct thorough ablation studies to assess the interplay between algorithm choices, data mixtures, and reward shaping in agentic contexts.
- Establish evaluation frameworks based on real agent tasks to measure progress and inform development.
- Collaborate with mid-training, infrastructure, and product teams to translate research findings into practical model enhancements.
