About the job
Location: Fully remote (EMEA timezone)
Start date: ASAP
Languages: Fluent English required
Industry: Cloud Computing / AI / European Deep-Tech SaaS
About the Role
Pragmatike is seeking a talented ML Ops Engineer to join a rapidly growing and well-funded distributed cloud infrastructure startup that is at the forefront of developing next-generation AI-native cloud services. This innovative company is revolutionizing the delivery of compute resources by offering GPU-powered infrastructure specifically designed for AI/ML workloads, secure storage solutions, and high-speed data transfer capabilities through a decentralized architecture that considerably minimizes environmental impact compared to traditional cloud providers.
In this critical role, you will leverage your expertise in building scalable, reliable, and efficient ML inference platforms to power real-time AI applications. Your responsibilities will include designing and maintaining the core infrastructure required for serving machine learning models at scale, working collaboratively with infrastructure, platform, and applied AI teams to ensure high availability, low latency, and cost-effective inference systems. A strong sense of ownership and production mindset, along with experience in distributed GPU systems, is vital.
Your Responsibilities
Develop and manage production-grade model serving infrastructure utilizing frameworks such as vLLM, TGI, Triton, or similar.
Design and execute robust deployment pipelines implementing blue/green and canary rollout strategies for ML models.
Create and sustain auto-scaling systems, multi-model serving architectures, and intelligent request routing layers.
Enhance GPU utilization, memory efficiency, network throughput, and model artifact storage performance.
Establish observability systems for monitoring inference latency, throughput, GPU usage, cost metrics, and overall system health.
Oversee model registries and CI/CD pipelines to enable automated and reproducible model deployments.
Manage the entire ML systems lifecycle from development to production, including operational support and on-call responsibilities.
Define engineering best practices and contribute to platform scalability in a dynamic startup environment.
Required Qualifications
Strong experience in ML Ops and production-grade model serving.
Proficiency with GPU systems and distributed computing frameworks.
Expertise in developing deployment strategies and managing CI/CD pipelines.
Excellent problem-solving skills with a focus on performance optimization.
Ability to work collaboratively in a fast-paced, team-oriented environment.

