About the job
Speechify builds tools that turn reading into an accessible audio experience. Over 50 million people use our text-to-speech products to listen to PDFs, books, Google Docs, news articles, and websites. Our goal: help people read faster, retain more, and remove barriers to learning.
Our products span iOS, Android, Mac, Chrome Extension, and Web App. We’ve been named Chrome Extension of the Year by Google and received Apple’s Design Award for Inclusivity in 2025.
The Speechify team is fully remote, with nearly 200 professionals worldwide. Our group includes frontend and backend engineers, AI research scientists, and experts from Amazon, Microsoft, Google, top PhD programs, and high-growth startups.
Role overview
We’re hiring a Software Engineer for the Data team within our AI department. This position focuses on all aspects of data collection that drive model training. The work blends infrastructure, engineering, and research to build large-scale, high-quality datasets efficiently and cost-effectively.
What you will do
- Find and acquire new audio data sources to expand our ingestion pipeline
- Manage and improve cloud infrastructure for data ingestion, currently on GCP and managed with Terraform
- Work closely with Scientists to improve cost, throughput, and quality metrics, delivering large-scale datasets for next-generation models
- Support the AI Team’s roadmap for datasets powering future Speechify consumer and enterprise products
Location
Phoenix, AZ, USA (fully distributed team)

