About the job
Join the Nebius Team
At Nebius, we are pioneers in the realm of cloud computing, dedicated to empowering the global AI economy. Our mission is to provide innovative tools and resources that enable our clients to tackle real-world challenges and revolutionize industries—all while minimizing infrastructure costs and the complexities of maintaining extensive in-house AI/ML teams. Collaborating with us means being at the forefront of AI cloud infrastructure, alongside some of the most seasoned and creative leaders and engineers in the industry.
Our Locations
Headquartered in Amsterdam and publicly listed on Nasdaq, Nebius boasts a worldwide presence with R&D hubs strategically located across Europe, North America, and Israel. Our diverse team of over 1400 professionals includes more than 400 highly skilled engineers who excel in both hardware and software engineering, along with a dedicated in-house AI R&D unit.
The Role
As part of the Token Factory division within Nebius Cloud, one of the largest GPU cloud infrastructures globally, you will contribute to the development of a sophisticated inference and fine-tuning platform. This platform supports a wide array of foundation models—spanning text, vision, audio, and emerging multimodal architectures—ensuring they are fast, reliable, and effortlessly scalable for training and deployment.
Advanced Fine-Tuning: Innovating fine-tuning techniques, including LoRA-based and full-parameter methodologies, for state-of-the-art LLMs (e.g., GPT-OSS, Kimi K2.5, DeepSeek V3.1/V3.2, GLM-4.7), with an emphasis on enhancing model quality and training efficiency.
- Inference Optimization: Analyzing LLM inference constraints to enhance production speed. This involves creating training and evaluation pipelines in JAX for speculative decoding, exploring various architectures (dense/MoE, auto-regressive/parallel), and establishing scaling laws to optimize resource allocation.

