About the job
At Speechify, our mission is to eliminate reading barriers to enhance learning opportunities for everyone.
With over 50 million users, Speechify transforms various reading materials—including PDFs, books, Google Docs, news articles, and websites—into audio formats, enabling users to read faster and retain more information. Our award-winning text-to-speech products encompass our iOS app, Android app, Mac app, Chrome extension, and web app, with notable recognitions such as the Chrome Extension of the Year by Google and the 2025 Design Award for Inclusivity from Apple.
Our team of nearly 200 professionals operates in a fully distributed environment, with no physical office locations. Our diverse team comprises frontend and backend engineers, AI research scientists, and talent from leading companies like Amazon, Microsoft, and Google, as well as top-tier PhD programs from Stanford and successful startups like Stripe and Vercel.
Overview
We are seeking a passionate Software Engineer to join our AI team, focusing on the data side of our operations. This role will oversee the data collection necessary for our model training processes, enabling us to create high-quality datasets at petabyte-scale efficiently through seamless integration of infrastructure, engineering, and research efforts.
What You’ll Do
- Proactively identify new audio data sources and integrate them into our ingestion pipeline.
- Manage and enhance the cloud infrastructure supporting our ingestion pipeline, which currently operates on Google Cloud Platform (GCP) and is managed with Terraform.
- Work closely with scientists to optimize cost, throughput, and quality, delivering more extensive datasets at lower costs to enhance our next-generation models.
- Collaborate with the AI Team and Speechify leadership to develop the dataset roadmap that will drive our upcoming consumer and enterprise products.
An Ideal Candidate Should Have
- BS, MS, or PhD in Computer Science or a related discipline.
- At least 5 years of professional experience in software development.
- Strong proficiency in bash and Python scripting within Linux environments.
- Extensive knowledge of Docker and Infrastructure-as-Code principles, with professional experience in at least one major cloud provider (we use GCP).
- Experience with web crawlers and large-scale data processing workflows is advantageous.
- Ability to manage multiple priorities and adapt to evolving needs.
- Exceptional written and verbal communication skills.

