About the job
About Fluidstack
At Fluidstack, we are revolutionizing the infrastructure for advanced intelligence. Collaborating with leading AI laboratories, government entities, and major corporations such as Mistral, Poolside, Black Forest Labs, and Meta, we aim to deliver computational power at unprecedented speeds.
We are diligently striving to realize Artificial General Intelligence (AGI). Our team is driven by a shared mission, emphasizing high-quality infrastructure that enhances our clients' success. We pride ourselves on our commitment to excellence and the trust we build with our customers. If you are passionate about making a meaningful impact and are dedicated to advancing the frontier of intelligence, we invite you to join us in shaping the future.
Role Overview
We are seeking a Product Manager to lead our AI platform roadmap, encompassing managed inference and agent platforms. You will be responsible for defining how Fluidstack empowers customers to deploy, scale, and optimize large language model (LLM) inference workloads—covering aspects from model serving and routing to agent orchestration and complex AI systems. This role involves balancing customer demands for low latency and high throughput with the practical considerations of GPU utilization, cost-effectiveness, and platform reliability. You will collaborate with engineering, machine learning research, and go-to-market teams to strategically position Fluidstack against inference-driven competitors such as Together AI, Fireworks, Baseten, Modal, and Replicate.
Key Responsibilities
Lead the product strategy and roadmap for managed inference services, focusing on model deployment, autoscaling, multi-LoRA serving, and inference optimization.
Define requirements for agent platform functionalities, including structured outputs, function calling, memory primitives, tool integration, and multi-step reasoning workflows.
Make informed decisions regarding prioritization of inference optimizations such as speculative decoding, continuous batching, KV cache management, quantization support, and custom kernel integration.
Collaborate with ML infrastructure engineers to create APIs, SDKs, and deployment workflows that facilitate model fine-tuning, version management, and A/B testing.
Partner with datacenter teams to enhance GPU allocation strategies—balancing dedicated versus serverless deployments, cold start latency, and cost-per-token economics.
Conduct competitive analysis of offerings from Together AI (inference optimization stack), Fireworks (custom inference engine), Baseten (training-to-inference integration), and Modal (serverless architecture).
Establish pricing models that reflect customer usage patterns (tokens, requests, GPU-hours) while ensuring platform sustainability.

