About the job
At Speechify, our mission is to eliminate reading barriers to enhance learning experiences for everyone.
With over 50 million users benefiting from our innovative text-to-speech solutions, we transform a variety of reading materials—such as PDFs, books, Google Docs, news articles, and websites—into audio formats. This enables users to read faster, absorb more, and retain information effectively. Our diverse offerings include iOS and Android apps, Mac application, Chrome extension, and a web platform. Notably, Speechify has been recognized as the Chrome Extension of the Year by Google and awarded the 2025 Design Award for Inclusivity by Apple.
Currently, our team comprises nearly 200 talented professionals working remotely across the globe, including frontend and backend engineers, AI research scientists, and industry veterans from renowned organizations like Amazon, Microsoft, and Google, as well as emerging startups like Stripe and Vercel.
Role Overview
We are seeking a dedicated Software Engineer to join our AI team, focusing on the data side. This role will be pivotal in overseeing the entire data collection process essential for our model training operations. By leveraging our unique integration of infrastructure, engineering, and research, we can efficiently develop high-quality datasets on a massive scale.
Your Responsibilities
- Identify and source new audio data streams to integrate into our ingestion pipeline.
- Manage and enhance our cloud infrastructure for the ingestion pipeline, currently hosted on GCP and managed through Terraform.
- Collaborate with data scientists to optimize cost, throughput, and quality, ensuring a rich data supply for our advanced models.
- Work alongside the AI Team and Speechify leadership to develop a strategic dataset roadmap for future consumer and enterprise products.

