About the job
Speechify’s mission is to remove reading barriers and make learning accessible for everyone.
Over 50 million people use Speechify’s text-to-speech tools to turn PDFs, books, Google Docs, news articles, and web pages into audio. These products help users read faster, understand more, and remember what they learn. Speechify has been named Chrome Extension of the Year by Google and received Apple’s 2025 Design Award for Inclusivity.
The team includes nearly 200 professionals around the world, working fully remotely. Engineers, AI researchers, and leaders from companies like Amazon, Microsoft, and Google, as well as alumni from top PhD programs and fast-growing startups, all contribute to Speechify’s growth.
Role Overview
Speechify is hiring a Software Engineer for the AI team’s data group. This engineer will help manage every aspect of data collection for model training. The team builds large, high-quality datasets at petabyte scale, combining infrastructure, engineering, and research to do so efficiently.
What You’ll Do
- Source and identify new audio data for ingestion pipelines.
- Manage and improve cloud infrastructure on Google Cloud Platform (GCP) using Terraform.
- Work with scientists to improve cost, throughput, and data quality to support advanced model development.
- Collaborate with AI team members and company leadership to plan a dataset roadmap for future consumer and enterprise products.
Qualifications
- Bachelor’s, Master’s, or PhD in Computer Science or a related field.
- At least 5 years of professional software development experience.
- Expertise in bash or Python scripting in Linux environments.
- Strong skills with Docker and Infrastructure-as-Code, with hands-on experience in a major cloud provider (GCP preferred).
- Experience with web crawling and large-scale data processing is a plus.
- Comfort managing multiple priorities and adapting as things change.
- Clear written and verbal communication skills.
Location
Santa Clara, CA, USA (fully distributed team).

