companySpeechify logo

Software Engineer - Data Infrastructure & Acquisition

SpeechifyShenzhen, China
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

BS, MS, or PhD in Computer Science or a related discipline. 5+ years of experience in software development. Strong proficiency in bash/Python scripting within Linux environments. Experience with Docker and Infrastructure-as-Code principles, with hands-on experience in at least one major cloud platform (preferably GCP). Familiarity with web crawlers and large-scale data processing workflows is advantageous. Ability to manage multiple priorities effectively and adapt to changing demands. Excellent written and verbal communication skills.

About the job

Speechify builds tools that remove reading barriers and help people learn more efficiently. Our text-to-speech products serve over 50 million users, supporting formats like PDFs, books, Google Docs, news articles, and websites. The product lineup includes apps for iOS, Android, Mac, a Chrome extension, and a web application. Speechify has earned recognition as Chrome Extension of the Year from Google and received the 2025 Apple Design Award for Inclusivity.

The company operates fully remotely, with nearly 200 team members worldwide. Engineers, AI researchers, and staff come from organizations such as Amazon, Microsoft, Google, Stanford, Stripe, Vercel, and Bolt.

Role Overview

Speechify is hiring a Software Engineer for the Data Infrastructure & Acquisition team in Shenzhen, China. This engineer will join the AI group, focusing on collecting and managing large-scale datasets for model training. The work blends infrastructure, engineering, and research to build scalable, cost-effective data pipelines.

What You Will Do

  • Find and secure new audio data sources to strengthen the ingestion pipeline.
  • Manage and expand cloud infrastructure for data ingestion, primarily on Google Cloud Platform (GCP) using Terraform.
  • Collaborate with scientists to improve data cost, throughput, and quality, supporting advanced model development.
  • Work with the AI team and company leadership to shape a strategic dataset roadmap for future consumer and enterprise products.

Qualifications

  • BS, MS, or PhD in Computer Science or a related field.
  • At least 5 years of software development experience.
  • Proficient in bash and Python scripting in Linux environments.
  • Experience with Docker and Infrastructure-as-Code; hands-on with at least one major cloud platform (GCP preferred).
  • Familiarity with web crawlers and large-scale data processing is a plus.
  • Strong organizational skills and ability to handle shifting priorities.
  • Excellent written and verbal communication abilities.

About Speechify

Speechify is dedicated to making reading accessible for all. Our innovative text-to-speech technology has transformed the reading experience for over 50 million users, enabling them to engage with content like never before. Our fully remote team is comprised of top talent from leading tech companies and prestigious academic institutions, all united by a common goal: to enhance learning through technology.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.