companyNebius logo

Senior Machine Learning Engineer - Token Factory

NebiusAmsterdam, Netherlands; Berlin, Germany; Israel; London, United Kingdom; Prague, Czech Republic; Remote - Europe
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

We are looking for candidates who possess:A solid understanding of the theoretical foundations of machine learning and transformer architectures. Experience with profiling GPU workloads using tools like Nsight, PyTorch profiler, or similar. A grasp of GPU memory hierarchy and the trade-offs between compute and memory. Familiarity with key concepts in the LLM domain, such as MHA, RoPE, KV-cache, Flash Attention, and quantization. An understanding of performance optimization for large neural network training, including sharding strategies.

About the job

Why Join Nebius
Nebius is at the forefront of cloud computing, empowering the global AI economy. We provide our clients with essential tools and resources to tackle real-world challenges and revolutionize industries without incurring hefty infrastructure costs or needing to develop extensive in-house AI/ML teams. Our team operates at the cutting edge of AI cloud infrastructure, collaborating with some of the most experienced and innovative leaders and engineers in the industry.

Our Work Environment
Headquartered in Amsterdam and publicly traded on Nasdaq, Nebius boasts a worldwide presence with R&D centers across Europe, North America, and Israel. Our diverse team of over 1,400 employees includes more than 400 highly skilled engineers with deep expertise in both hardware and software engineering, complemented by an in-house AI R&D team.

About the Role

As part of Nebius Cloud, which operates one of the largest GPU clouds globally—utilizing tens of thousands of GPUs—Token Factory is dedicated to building a high-performance inference and fine-tuning platform that pushes foundational models to their hardware limits. Our commitment is to enhance throughput, reduce latency, and optimize cost-per-token across our expansive GPU resources.

Current Projects You Could Contribute To:

  • Inference Optimization: Identify bottlenecks in LLM inference to accelerate production speed. Maximize performance for a variety of LLM architectures at scale (e.g., GPT-OSS, Kimi K2.5, DeepSeek V3.1/V3.2, GLM-5).
  • Inference Engine Support: Implement innovative speculative decoding architectures, optimize components of different LLM designs (dense/MoE, autoregressive/parallel), and contribute to open-source inference engines.
  • Low Precision Training & Inference: Design and operationalize low-precision training (FP8, NVFP4/MXFP4) and inference pipelines to achieve significant improvements in throughput and cost efficiency.

About Nebius

Nebius is pioneering advancements in cloud computing to support the burgeoning AI economy, offering innovative solutions that enable businesses to harness the power of AI without the burden of substantial infrastructure investments.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.