White Circle logoWhite Circle logo

Audio AI Engineer

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

3+ years of experience in training large-scale deep learning models in audio, speech, or acoustic domains. Hands-on expertise with PyTorch and distributed training techniques (DeepSpeed, FSDP, etc.). Knowledge of audio/speech architectures such as Audio Qwen, Whisper, HuBERT, or similar. Experience with vision-language and multimodal architectures (e.g., Audio Flamingo, Omni Qwen).

About the job

TLDR: We are seeking an experienced Audio and Multimodal Machine Learning Engineer to develop, train, and deploy cutting-edge speech, audio, and multimodal AI models for an advanced AI safety platform that handles over 100 million API calls each month.

About Us

White Circle is at the forefront of AI safety, dedicated to creating a reliable and optimized framework for AI systems. Our innovative platform is powered by simple natural-language policies that dictate the acceptable behaviors of AI models. We automate the testing, enforcement, and continuous enhancement of these policies to ensure they scale effectively.

  • Backed by $11 million from leading investors, founders, and executives from organizations like OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, and Datadog.

  • Processing over 100 million API calls monthly.

  • We specialize in fine-tuning and training our own large language models to outperform both open-source and proprietary alternatives in speed and cost.

Our team is small yet immensely focused. If you are eager to tackle complex challenges, quickly see your contributions in production, and shape the future of AI safety, we want you on our team.

Your Responsibilities:

  • Train and refine large-scale audio and multimodal models from scratch and using pretrained checkpoints.

  • Design and execute experiments, including architectural modifications, data mixes, and training methodologies.

  • Create and maintain audio data pipelines, transforming raw recordings into training-ready datasets.

  • Optimize models for production environments, focusing on quantization, distillation, and streaming inference.

  • Implement end-to-end model deployment, ensuring low-latency serving from research checkpoints.

  • Collaborate with research teams to translate experimental concepts into deployable features.

  • Establish key evaluation metrics and benchmarks that are crucial for product performance.

Ideal Candidate:

  • 3+ years of experience in training large-scale deep learning models within audio, speech, or acoustic realms.

  • Proficient in PyTorch and experienced in distributed training frameworks (such as DeepSpeed, FSDP, etc.).

  • Familiar with audio/speech architectures like Audio Qwen, Whisper, HuBERT, or Conformer.

  • Experience with multimodal architectures such as Audio Flamingo, Omni Qwen, etc.

About White Circle

White Circle is a pioneering company focused on AI safety, aiming to establish a foundation of reliability and optimization for AI systems. Our platform utilizes straightforward natural-language policies to govern AI behavior, ensuring compliance through automated testing and ongoing refinement.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.