About the job
At Speechify, our mission is to eliminate reading barriers for learners everywhere.
With over 50 million users benefiting from our innovative text-to-speech technology, we empower individuals to convert any text—from PDFs and books to Google Docs, news articles, and websites—into audio format. Our suite of products includes an iOS app, Android app, Mac app, Chrome extension, and web app, all designed to enhance reading speed and retention. We are proud to have been recognized as the Chrome Extension of the Year by Google and to have received Apple's 2025 Design Award for Inclusivity.
Our team, comprised of nearly 200 talented professionals from diverse backgrounds including Amazon, Microsoft, Google, and top-tier academic institutions, operates entirely in a distributed environment—meaning we have no physical office. This includes frontend and backend engineers, AI research scientists, and founders of successful startups.
Role Overview
We are seeking a passionate Software Engineer to join our AI team's data division. In this role, you will be instrumental in overseeing the data collection processes that fuel our model training operations. Your expertise will help us build high-quality datasets at petabyte scale, leveraging our integrated infrastructure, engineering, and research efforts.
Key Responsibilities
- Utilize creativity and resourcefulness to identify new audio data sources and integrate them into our data ingestion pipeline.
- Manage and enhance our cloud infrastructure for the ingestion pipeline, currently hosted on Google Cloud Platform and configured with Terraform.
- Work closely with our scientists to enhance cost efficiency, throughput, and data quality, providing richer datasets at scale to support our next-generation models.
- Collaborate with fellow AI team members and Speechify leadership to develop a strategic dataset roadmap for our future consumer and enterprise products.
Ideal Candidate Qualifications
- BS, MS, or PhD in Computer Science or a related discipline.
- 5+ years of professional experience in software development.
- Strong proficiency in bash and Python scripting within Linux environments.
- Experience with Docker and Infrastructure-as-Code principles, along with hands-on experience with at least one major cloud provider (GCP preferred).
- Familiarity with web crawling and large-scale data processing workflows is a plus.
- Exceptional multitasking ability and adaptability to evolving priorities.
- Excellent written and verbal communication skills.

