About the job
At Speechify, our mission is to eliminate reading as a barrier to learning.
More than 50 million users leverage Speechify's innovative text-to-speech products that transform various reading materials—such as PDFs, books, Google Docs, news articles, and websites—into audio. This empowers individuals to read faster, absorb more information, and improve retention. Our suite of text-to-speech solutions includes iOS and Android applications, a Mac app, a Chrome extension, and a web app. Speechify was recently recognized by Google as the Chrome Extension of the Year and received Apple's 2025 Design Award for Inclusivity.
Currently, we have a diverse team of nearly 200 professionals working in a fully distributed environment, allowing flexibility and collaboration from anywhere in the world. Our team comprises frontend and backend engineers, AI research scientists, and talented individuals from leading tech companies like Amazon, Microsoft, and Google, as well as innovative startups like Stripe, Vercel, and Bolt.
Overview
We are on the lookout for a talented Software Engineer to join our AI team, focusing on the data infrastructure side. This role involves overseeing all aspects of data collection to enhance our model training operations. With our advanced integration of infrastructure, engineering, and research, we build high-quality datasets at petabyte scale and minimal cost. If you are passionate about data engineering and AI, we would love to have you on board!
What You’ll Do
- Explore and identify new audio data sources, integrating them into our ingestion pipeline.
- Manage and expand our cloud infrastructure for the ingestion pipeline, currently utilizing GCP and Terraform.
- Work collaboratively with our Scientists to optimize cost, throughput, and quality, delivering richer datasets at scale to fuel our next-generation models.
- Engage with the AI Team and leadership to develop a comprehensive dataset roadmap that supports Speechify’s future consumer and enterprise products.
An Ideal Candidate Should Have
- Bachelor's, Master's, or PhD in Computer Science or a related discipline.
- 5+ years of experience in software development.
- Strong proficiency in bash and Python scripting in Linux environments.
- Expertise in Docker and Infrastructure-as-Code principles, with hands-on experience in at least one major cloud platform (GCP preferred).
- Familiarity with web crawlers and large-scale data processing workflows is a plus.
- Ability to manage multiple priorities and adapt to shifting demands.
- Excellent communication skills, both written and verbal.

