About the job
At Speechify, our mission is to eliminate reading barriers to enhance learning experiences. Our innovative text-to-speech products empower over 50 million users to transform their reading materials—whether PDFs, books, Google Docs, news articles, or websites—into engaging audio formats. This allows our users to read faster, absorb more information, and retain knowledge effectively.
Our suite of products includes dynamic applications across various platforms: iOS, Android, Mac, Chrome Extension, and Web. We take pride in our recognition as the Chrome Extension of the Year by Google and as the 2025 Design Award winner for Inclusivity by Apple.
With a distributed team of nearly 200 talented professionals from leading tech companies and top academic institutions, Speechify thrives on collaboration among frontend and backend engineers, AI research scientists, and other skilled individuals, all working remotely without a central office.
Role Overview
We are seeking a dedicated Software Engineer to join our Data team within the AI sector at Speechify. This role plays a crucial part in data collection to support our model training operations, enabling the creation of high-quality datasets on a petabyte scale at a low cost through an integrated approach to infrastructure, engineering, and research.
Responsibilities
- Identify and source new audio data for integration into our ingestion pipeline.
- Manage and enhance our cloud infrastructure for the ingestion pipeline, currently hosted on GCP and maintained with Terraform.
- Work closely with our Scientists to optimize cost, throughput, and quality, ensuring access to richer data at larger scales and reduced costs to advance our next-generation models.
- Collaborate with the AI Team and Speechify Leadership to develop the dataset roadmap that will drive the innovation of our next-generation consumer and enterprise products.

