About the job
Speechify builds technology that turns written content into audio, helping over 50 million users learn and access information in new ways. Our text-to-speech tools work with PDFs, books, Google Docs, news articles, and websites, making reading more accessible and efficient.
Our suite of products spans iOS, Android, Mac, and Chrome. Speechify has earned recognition from Google as Chrome Extension of the Year and received Apple’s 2025 Design Award for Inclusivity.
The team at Speechify is fully distributed, with nearly 200 professionals worldwide. Members include frontend and backend engineers, AI research scientists, and leaders from companies such as Amazon, Microsoft, and Google, plus alumni from Stanford and startups like Stripe and Vercel. There is no central office; everyone works remotely.
Role Overview
The Data team within Speechify’s AI division is seeking a Software Engineer focused on Data Infrastructure & Acquisition. This position centers on managing and improving the systems that collect and prepare data for model training. The team’s mission is to assemble large-scale, high-quality datasets efficiently and cost-effectively, combining infrastructure, engineering, and research expertise.
What You Will Do
- Find and secure new sources of audio data, then integrate them into the data ingestion pipeline.
- Maintain and improve the cloud infrastructure for the ingestion pipeline, which runs on Google Cloud Platform and uses Terraform for management.
- Partner with Scientists to optimize cost, throughput, and data quality, enabling richer datasets at scale for next-generation models.
- Work with the AI team and company leadership to shape the dataset roadmap for both consumer and enterprise product development.
Location
This role is based in Nairobi, Kenya, as part of Speechify’s distributed team.

