Thinking Machines Lab logoThinking Machines Lab logo

Lead Data Partnerships at Thinking Machines Lab

On-site Full-time $175K/yr - $475K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Mid to Senior

Qualifications

We are looking for candidates who possess: A strong technical background with experience in data procurement or management. Excellent negotiation skills, particularly in dealing with data vendors. The ability to work collaboratively across multiple teams, including research and legal. A proactive approach to identifying and fulfilling data needs in a fast-paced research environment. Strong analytical skills for evaluating data quality and relevance.

About the job

At Thinking Machines Lab, we are dedicated to empowering humanity through the advancement of collaborative general intelligence. Our vision is to create a future where everyone has access to the knowledge and tools necessary to harness AI for their distinct needs and aspirations.

Our team comprises scientists, engineers, and innovators who have developed some of the most utilized AI products globally, including ChatGPT and Character.ai, as well as leading open-weight models like Mistral and popular open-source initiatives such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.

About the Role

As the Lead for Data Partnerships at Thinking Machines Lab, you will oversee the complete data procurement pipeline for frontier model training. This includes understanding the data requirements of our research teams, sourcing and finalizing agreements with providers, and managing the quality and delivery of data. You will serve as the bridge connecting our research, legal teams, and external vendors, ensuring timely access to the right data for our teams.

This role is perfect for someone with a technical inclination who is eager to delve into the intricacies of data to support an ambitious research agenda. You must be adept at switching contexts between planning the data needed for training runs and negotiating pricing with vendors. Over time, you will establish scalable and repeatable processes to ensure our data operations align with the pace of our research efforts.

What You Will Do

  • Lead and coordinate end-to-end data procurement initiatives, ensuring complex sourcing activities are conducted with efficiency, transparency, and scientific rigor.
  • Collaborate closely with research teams to proactively identify data needs across pre-training, post-training, and evaluation workstreams, anticipating requirements rather than merely reacting to requests.
  • Source, assess, and onboard data providers, developing and maintaining a pipeline of potential vendors across various domains.
  • Negotiate pricing, licensing terms, and contract structures with data providers, collaborating with legal teams to finalize agreements that align with our research objectives.
  • Evaluate incoming data alongside researchers, determining quality and coverage for intended training goals.
  • Monitor and manage ongoing data deliveries, tracking schedules, addressing issues, and verifying that received data aligns with agreements.
  • Create repeatable, scalable processes surrounding the entire data procurement lifecycle, enhancing the speed and systematic nature of data sourcing over time.
  • Translate technical data requirements into actionable plans with clear milestones, ensuring team alignment across projects.

About Thinking Machines Lab

Thinking Machines Lab is at the forefront of AI innovation, striving to empower individuals through collaborative general intelligence. We are committed to building technologies that democratize access to AI, ensuring it serves the diverse needs of the global community.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.