About the job
At Speechify, our mission is to eliminate reading barriers to enhance learning for everyone.
With over 50 million users, our innovative text-to-speech tools transform reading materials—including PDFs, books, Google Docs, news articles, and websites—into audio formats. This empowers users to read faster, absorb more, and retain information better. Our award-winning products include applications for iOS, Android, Mac, a Chrome Extension, and a Web App. Our recognition includes being named Chrome Extension of the Year by Google and winning Apple's 2025 Design Award for Inclusivity.
Our team of nearly 200 professionals operates fully remotely, with no physical office. Our diverse group includes frontend and backend engineers, AI researchers, and other talents from leading companies such as Amazon, Microsoft, and Google, as well as esteemed academic institutions like Stanford and high-growth startups like Stripe, Vercel, and Bolt.
Role Overview
We are searching for an innovative Software Engineer to join the data segment of our AI team. This position is crucial for managing data collection to facilitate our model training operations, allowing us to construct extensive datasets efficiently and affordably through a robust integration of infrastructure, engineering, and research.
Your Responsibilities
- Identify and acquire new audio data sources to enhance our ingestion pipeline.
- Manage and extend our cloud infrastructure for the ingestion pipeline, currently utilizing GCP with Terraform.
- Work collaboratively with our Scientists to optimize the balance of cost, throughput, and quality to provide richer data at scale for our next-generation models.
- Partner with the AI Team and Speechify Leadership to develop a strategic dataset roadmap that will drive the evolution of our consumer and enterprise products.
Ideal Candidate Profile
- BS, MS, or PhD in Computer Science or a related discipline.
- 5+ years of professional experience in software development.
- Strong proficiency in Bash and Python scripting within Linux environments.
- Expertise in Docker and Infrastructure-as-Code principles, with hands-on experience using at least one major Cloud Provider (GCP preferred).
- Familiarity with web crawlers and large-scale data processing workflows is advantageous.
- Ability to manage multiple tasks and adapt to changing priorities effectively.
- Excellent written and verbal communication skills.

