About the job
Speechify builds text-to-speech tools used by over 50 million people worldwide. Our products help users turn reading materials, PDFs, books, Google Docs, news articles, and websites, into audio, making information more accessible and improving learning and retention. The product suite spans iOS, Android, Mac, Chrome extension, and web. Recent recognition includes Google’s Chrome Extension of the Year and Apple’s 2025 Design Award for Inclusivity.
The Speechify team works fully remote, with nearly 200 people collaborating from locations around the globe. Team members bring experience from Amazon, Microsoft, Google, Stripe, Vercel, Bolt, and top academic programs like Stanford.
Role overview: Software Engineer - Data Infrastructure & Acquisition
This role sits within the AI team’s Data division. The engineer will own data collection processes that support model training, helping Speechify build and scale high-quality datasets efficiently. The team’s infrastructure enables petabyte-scale dataset creation by combining engineering, infrastructure, and research.
What you will do
- Identify and source new audio data for integration into Speechify’s ingestion pipeline.
- Manage and improve cloud infrastructure for the ingestion pipeline using Google Cloud Platform (GCP) and Terraform.
- Work with data scientists to boost cost efficiency, throughput, and dataset quality, supporting the development of next-generation models.
- Collaborate with AI team members and company leadership to shape the dataset roadmap for future consumer and enterprise products.
Location
This position is based in Curitiba, Brazil, with remote collaboration as part of Speechify’s global team.

