About the job

About Us

White Circle is a pioneering AI Safety organization dedicated to building a robust safety, reliability, and optimization framework for AI systems. Our platform operates on straightforward natural-language policies that dictate the appropriate behaviors of AI models. We automate the testing, enforcement, and ongoing enhancement of these policies at scale.

We have successfully raised $11 million from prominent investors, including founders and senior leaders from OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and more.
Our infrastructure processes over 100 million API calls every month.
We specialize in fine-tuning and training our proprietary LLMs to ensure faster and more cost-efficient performance than any available open or proprietary models.

We are a compact, highly motivated team. If you are eager to tackle challenging problems, get your work deployed rapidly, and significantly impact the construction of AI safety – we want to hear from you.

Your Responsibilities:

Train vision-language models from the ground up and fine-tune existing architectures for advanced image understanding.
Expand VLM capabilities to video by designing innovative temporal modeling approaches and efficiently managing long-context.
Develop impactful evaluation benchmarks focusing on visual QA, spatial reasoning, and video comprehension.
Curate and maintain multimodal datasets, including creating synthetic data generation pipelines.
Train and optimize MoE architectures to enhance multimodal inference efficiency.
Deploy models into production, focusing on quantization, batching strategies, and latency optimization.

You Will Excel If You Have:

3+ years of experience in training and fine-tuning vision-language models (e.g., LLaVA, Qwen-VL, InternVL).
A strong background in multimodal architectures, with a clear understanding of how vision encoders, projectors, and LLMs integrate.
Hands-on experience with RLHF/alignment for multimodal systems, specifically GRPO, DPO, and reward modeling.
Experience in video understanding, including temporal modeling and efficient attention mechanisms.

About the job

About Us

We have successfully raised $11 million from prominent investors, including founders and senior leaders from OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and more.
Our infrastructure processes over 100 million API calls every month.
We specialize in fine-tuning and training our proprietary LLMs to ensure faster and more cost-efficient performance than any available open or proprietary models.

Your Responsibilities:

Train vision-language models from the ground up and fine-tune existing architectures for advanced image understanding.
Expand VLM capabilities to video by designing innovative temporal modeling approaches and efficiently managing long-context.
Develop impactful evaluation benchmarks focusing on visual QA, spatial reasoning, and video comprehension.
Curate and maintain multimodal datasets, including creating synthetic data generation pipelines.
Train and optimize MoE architectures to enhance multimodal inference efficiency.
Deploy models into production, focusing on quantization, batching strategies, and latency optimization.

You Will Excel If You Have:

3+ years of experience in training and fine-tuning vision-language models (e.g., LLaVA, Qwen-VL, InternVL).
A strong background in multimodal architectures, with a clear understanding of how vision encoders, projectors, and LLMs integrate.
Hands-on experience with RLHF/alignment for multimodal systems, specifically GRPO, DPO, and reward modeling.
Experience in video understanding, including temporal modeling and efficient attention mechanisms.

About Us

We have successfully raised $11 million from prominent investors, including founders and senior leaders from OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and more.

Our infrastructure processes over 100 million API calls every month.

We specialize in fine-tuning and training our proprietary LLMs to ensure faster and more cost-efficient performance than any available open or proprietary models.

Your Responsibilities:

Train vision-language models from the ground up and fine-tune existing architectures for advanced image understanding.

Expand VLM capabilities to video by designing innovative temporal modeling approaches and efficiently managing long-context.

Develop impactful evaluation benchmarks focusing on visual QA, spatial reasoning, and video comprehension.

Curate and maintain multimodal datasets, including creating synthetic data generation pipelines.

Train and optimize MoE architectures to enhance multimodal inference efficiency.

Deploy models into production, focusing on quantization, batching strategies, and latency optimization.

You Will Excel If You Have:

3+ years of experience in training and fine-tuning vision-language models (e.g., LLaVA, Qwen-VL, InternVL).

A strong background in multimodal architectures, with a clear understanding of how vision encoders, projectors, and LLMs integrate.

Hands-on experience with RLHF/alignment for multimodal systems, specifically GRPO, DPO, and reward modeling.

Experience in video understanding, including temporal modeling and efficient attention mechanisms.

About Us

We have successfully raised $11 million from prominent investors, including founders and senior leaders from OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and more.

Our infrastructure processes over 100 million API calls every month.

We specialize in fine-tuning and training our proprietary LLMs to ensure faster and more cost-efficient performance than any available open or proprietary models.

Your Responsibilities:

Train vision-language models from the ground up and fine-tune existing architectures for advanced image understanding.

Expand VLM capabilities to video by designing innovative temporal modeling approaches and efficiently managing long-context.

Develop impactful evaluation benchmarks focusing on visual QA, spatial reasoning, and video comprehension.

Curate and maintain multimodal datasets, including creating synthetic data generation pipelines.

Train and optimize MoE architectures to enhance multimodal inference efficiency.

Deploy models into production, focusing on quantization, batching strategies, and latency optimization.

You Will Excel If You Have:

3+ years of experience in training and fine-tuning vision-language models (e.g., LLaVA, Qwen-VL, InternVL).

A strong background in multimodal architectures, with a clear understanding of how vision encoders, projectors, and LLMs integrate.

Hands-on experience with RLHF/alignment for multimodal systems, specifically GRPO, DPO, and reward modeling.

Experience in video understanding, including temporal modeling and efficient attention mechanisms.

AI Engineer - Vision at White Circle | Paris

Unlock Your Potential

Experience Level

Qualifications

About the job

About Us

Your Responsibilities:

You Will Excel If You Have:

About White Circle

Executive Chef at Relais & Châteaux | Baltimore

Territory Sales Manager

Physical Therapist at Integrity Rehab Group | Fayetteville, NC

Team Leader at Greene King | Greenwich

Part-Time P&C Administrator at Primark | Bury

Team Leader at Greene King | Soho

Talent Partner - In-house Recruiter at ennovationHUB | Barcelona

Part-Time Chef at Greene King | Walnut Tree

Part-Time Chef at Greene King | Walnut Tree

Cooks

Desk Investigator Officer

Senior Scheduling Coordinator

Commercial Cleaner

Anti-Fraud Officer - Transaction Monitoring

Project Management Information Systems Specialist

Merchant Relations Officer - Bekasi

Associate Talent Acquisition - 6 Month Contract

Bar & Waiting Staff at Greene King | Chichester

Bar and Waiting Staff at Greene King | Chichester

Door Attendant at Raffles The Red Sea | Umluj

AI Engineer - Vision at White Circle | Paris

Unlock Your Potential

Experience Level

Qualifications

About the job

About Us

Your Responsibilities:

You Will Excel If You Have:

About White Circle

AI Engineer - Vision at White Circle | Paris

Unlock Your Potential

Experience Level

Qualifications

About the job

About Us

Your Responsibilities:

You Will Excel If You Have:

About White Circle

AI Engineer - Vision at White Circle | Paris

Unlock Your Potential

Experience Level

Qualifications

About the job

About Us

Your Responsibilities:

You Will Excel If You Have:

About White Circle