About the job
Why Join Nebius
Nebius is at the forefront of cloud computing, empowering the global AI economy. We provide our clients with essential tools and resources to tackle real-world challenges and revolutionize industries without incurring hefty infrastructure costs or needing to develop extensive in-house AI/ML teams. Our team operates at the cutting edge of AI cloud infrastructure, collaborating with some of the most experienced and innovative leaders and engineers in the industry.
Our Work Environment
Headquartered in Amsterdam and publicly traded on Nasdaq, Nebius boasts a worldwide presence with R&D centers across Europe, North America, and Israel. Our diverse team of over 1,400 employees includes more than 400 highly skilled engineers with deep expertise in both hardware and software engineering, complemented by an in-house AI R&D team.
About the Role
As part of Nebius Cloud, which operates one of the largest GPU clouds globally—utilizing tens of thousands of GPUs—Token Factory is dedicated to building a high-performance inference and fine-tuning platform that pushes foundational models to their hardware limits. Our commitment is to enhance throughput, reduce latency, and optimize cost-per-token across our expansive GPU resources.
Current Projects You Could Contribute To:
- Inference Optimization: Identify bottlenecks in LLM inference to accelerate production speed. Maximize performance for a variety of LLM architectures at scale (e.g., GPT-OSS, Kimi K2.5, DeepSeek V3.1/V3.2, GLM-5).
- Inference Engine Support: Implement innovative speculative decoding architectures, optimize components of different LLM designs (dense/MoE, autoregressive/parallel), and contribute to open-source inference engines.
- Low Precision Training & Inference: Design and operationalize low-precision training (FP8, NVFP4/MXFP4) and inference pipelines to achieve significant improvements in throughput and cost efficiency.

