About the job
Speechify builds text-to-speech tools that help over 50 million people turn written content, like PDFs, books, Google Docs, news, and web pages, into audio. Our mission is to make reading accessible for everyone. Industry leaders have recognized our work: Google named us Chrome Extension of the Year, and Apple awarded us the 2025 Design Award for Inclusivity.
Our distributed team spans nearly 200 professionals worldwide. Engineers, AI researchers, and specialists join us from organizations such as Amazon, Microsoft, Google, Stripe, Vercel, Bolt, and top academic programs including Stanford. We operate fully remotely, with no central office.
Role Overview
Speechify’s AI division is hiring a Software Engineer focused on Data Infrastructure & Acquisition. This position centers on building and maintaining the systems that collect and manage the vast datasets needed for training our machine learning models. The work blends infrastructure, engineering, and research to support data operations at petabyte scale.
What You Will Do
- Find and connect new audio data sources to our ingestion pipeline.
- Maintain and improve our data ingestion infrastructure, using Google Cloud Platform (GCP) and Terraform.
- Collaborate with scientists to optimize data cost, throughput, and quality for model improvement.
- Work with the AI team and company leadership to shape the dataset roadmap for future products.
Location
This role is based in Kochi, India.

