About the job
Speechify aims to remove reading barriers for learners worldwide. With more than 50 million users, our text-to-speech products turn everything from PDFs to news articles into audio, helping people read faster and remember more. Our iOS and Android apps, Mac app, Chrome extension, and web platform have received awards from Google and Apple for design and accessibility.
Our team includes nearly 200 people working fully remotely, drawing on experience from leading tech companies and top universities. Engineers, AI researchers, and product leaders collaborate closely to advance audio reading technology.
Role Overview
The Software Engineer - Data Infrastructure & Acquisition will join the AI team's data group. This role focuses on building and managing large-scale data collection systems that support model training. The work centers on developing high-quality datasets at petabyte scale using advanced infrastructure.
What You Will Do
- Find and connect new audio data sources to the ingestion pipeline.
- Maintain and grow cloud infrastructure on Google Cloud Platform (GCP) with Terraform.
- Partner with data scientists to improve dataset cost, throughput, and quality for next-generation models.
- Work with the AI team and company leaders to plan the dataset roadmap for both consumer and enterprise products.

