companyconnecthum logo

Audio and Multimodal Machine Learning Engineer

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

To thrive in this role, you should possess:A minimum of 3 years' experience in training deep learning models specifically within the audio or speech domains. Extensive knowledge of distributed training frameworks. A solid grasp of audio signal processing fundamentals. Experience in deploying models into production environments with an emphasis on latency optimization. Strong analytical and collaborative skills.

About the job

Join us as we seek to expand our team with an innovative Audio and Multimodal ML Engineer at connecthum, a thriving AI infrastructure startup dedicated to developing the safety and control framework for large-scale AI systems.

About connecthum:

  • An AI-native product company focused on enhancing AI safety and infrastructure.

  • Supported by leading international investors.

  • Expertly managing substantial AI traffic across diverse enterprise environments.

  • Specializing in training and fine-tuning proprietary models for superior performance and reliability.

  • A compact, highly skilled team that embraces a fast-paced development cycle.

We are committed to creating a robust control and evaluation layer for AI systems, empowering organizations to define, test, and enforce AI behavior in real-world scenarios.

Your Role:

  • Train and enhance large-scale audio and multimodal models.

  • Design and execute experiments focused on architecture, data mixtures, and training strategies.

  • Develop and optimize audio data pipelines to ensure efficiency.

  • Enhance inference speed, latency, and production readiness for models.

  • Deploy models end-to-end in low-latency environments.

  • Establish substantial evaluation metrics that extend beyond standard benchmark scores.

  • Collaborate closely with research and engineering teams to drive innovation.

This is a dynamic, hands-on position where research and production converge.

Technical Environment:

  • Utilizing PyTorch-based training pipelines.

  • Engaging in large-scale distributed training techniques.

  • Implementing speech and audio modeling architectures.

  • Integrating multimodal models for comprehensive solutions.

  • Optimizing models through quantization, distillation, and streaming inference.

  • Overseeing production deployment and serving systems.

(Full technical stack details will be shared during the interview process.)

Qualifications:

  • Minimum of 3 years of experience in training deep learning models, particularly in audio or speech domains.

  • Strong expertise in distributed training frameworks.

  • Deep understanding of audio signal processing fundamentals.

  • Proven experience in deploying models to production, with a focus on latency and performance.

  • Exceptional problem-solving skills and collaborative spirit.

About connecthum

connecthum is an innovative AI-native product company dedicated to enhancing AI safety and infrastructure. We are backed by prominent international investors and specialize in managing large-scale AI traffic across enterprises, ensuring high-performance and reliable AI systems.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.