About the job
Speechify builds tools that remove reading barriers and help people learn more efficiently. Our text-to-speech products serve over 50 million users, supporting formats like PDFs, books, Google Docs, news articles, and websites. The product lineup includes apps for iOS, Android, Mac, a Chrome extension, and a web application. Speechify has earned recognition as Chrome Extension of the Year from Google and received the 2025 Apple Design Award for Inclusivity.
The company operates fully remotely, with nearly 200 team members worldwide. Engineers, AI researchers, and staff come from organizations such as Amazon, Microsoft, Google, Stanford, Stripe, Vercel, and Bolt.
Role Overview
Speechify is hiring a Software Engineer for the Data Infrastructure & Acquisition team in Shenzhen, China. This engineer will join the AI group, focusing on collecting and managing large-scale datasets for model training. The work blends infrastructure, engineering, and research to build scalable, cost-effective data pipelines.
What You Will Do
- Find and secure new audio data sources to strengthen the ingestion pipeline.
- Manage and expand cloud infrastructure for data ingestion, primarily on Google Cloud Platform (GCP) using Terraform.
- Collaborate with scientists to improve data cost, throughput, and quality, supporting advanced model development.
- Work with the AI team and company leadership to shape a strategic dataset roadmap for future consumer and enterprise products.
Qualifications
- BS, MS, or PhD in Computer Science or a related field.
- At least 5 years of software development experience.
- Proficient in bash and Python scripting in Linux environments.
- Experience with Docker and Infrastructure-as-Code; hands-on with at least one major cloud platform (GCP preferred).
- Familiarity with web crawlers and large-scale data processing is a plus.
- Strong organizational skills and ability to handle shifting priorities.
- Excellent written and verbal communication abilities.

