Qualifications
Key ResponsibilitiesProactively identify new audio data sources and integrate them into our ingestion pipeline. Manage and enhance our cloud infrastructure for the ingestion pipeline, currently hosted on GCP and managed via Terraform. Work in close collaboration with our data scientists to redefine the cost, throughput, and quality dynamics, delivering richer datasets at a larger scale to support our next-gen models. Engage with the AI Team and Speechify Leadership to develop the dataset roadmap that will enhance our consumer and enterprise products. The Ideal Candidate Will PossessA BS/MS/PhD in Computer Science or a related discipline. A minimum of 5 years of professional experience in software development. Strong proficiency in bash/Python scripting within Linux environments. Experience with Docker and Infrastructure-as-Code practices, along with a proven track record with at least one major cloud provider (GCP preferred). Familiarity with web crawlers and large-scale data processing workflows is a plus. Ability to manage multiple priorities and adapt in a fast-paced environment. Excellent verbal and written communication skills.
About the job
Speechify builds text-to-speech tools that turn written content into audio, helping over 50 million people read faster and retain more. Our products include apps for iOS, Android, Mac, a Chrome extension, and a web platform. Recent recognition includes Google’s Chrome Extension of the Year and Apple’s 2025 Design Award for Inclusivity.
The company operates fully remotely, with a team of nearly 200 professionals worldwide. Team members include frontend and backend engineers, AI researchers, and specialists from companies like Amazon, Microsoft, and Google, as well as graduates of top PhD programs and founders of previous startups. Collaboration and innovation drive the work culture.
Role Overview
Speechify’s AI division is hiring a Software Engineer for the Data team. This role centers on data infrastructure and acquisition, supporting model training by gathering and managing large-scale datasets. The work blends infrastructure, engineering, and research to deliver high-quality data at petabyte scale while keeping costs low.
Key Focus Areas
- Design and build systems for data collection and management
- Support the creation of datasets used in model training
- Work closely with engineers and researchers to integrate infrastructure and research needs
- Help ensure data quality and efficiency at scale
This position offers the chance to contribute to accessible technology and make a measurable impact on Speechify’s AI capabilities.
About Speechify
Speechify is dedicated to transforming reading accessibility through cutting-edge text-to-speech technology. With a commitment to inclusivity and innovation, Speechify empowers users to learn and absorb information without barriers. Our fully remote team is composed of top talent from industry-leading companies, ensuring a collaborative and dynamic work environment.