About the job
Join vCluster as an AI Infrastructure Specialist, where you will engage directly with clients at a pivotal stage in their journey, from configuring bare metal GPU nodes to deploying production-ready solutions. This role transcends typical professional services; you will operate in a pre-sales capacity, focusing on proof of value engagements that lead to robust production environments. You'll be among the first technical contacts for our neocloud and AI Factory clients, and the playbooks you create will streamline the onboarding process for future hires and clients.
vCluster is rapidly gaining recognition in the GPU AI Cloud sphere, catering to enterprises that are building AI Factories. Our clients require the swift implementation of Kubernetes as a managed service on bare metal GPU infrastructure, and your expertise will be crucial in making this a reality.
Key Responsibilities:
Lead Technical Deployments: Oversee comprehensive technical deployments for GPU neocloud and AI Factory customers, starting from bare metal configuration to establishing a validated vCluster environment.
Infrastructure Optimization: Set up and troubleshoot bare metal GPU node infrastructure, including CNI configuration, GPU Operator setup, distributed storage backends, and RDMA/InfiniBand.
Validation: Implement and validate Kubernetes and vCluster to deliver GPU-powered managed Kubernetes solutions.
Knowledge Transfer: Collaborate with customer teams to foster self-sufficiency, ensuring they can independently manage and expand their platform.
Scaling through Documentation: Create and document reusable playbooks and deployment architectures, enabling future clients to benefit from your insights.
Feedback Loop: Partner with Engineering and Product teams to identify recurring infrastructure challenges, providing valuable insights that inform our product roadmap.
Strategic Partnering: Support the Sales team in pre-sales efforts where in-depth infrastructure work is essential for proving value.
Your Profile:
Production K8s Mastery: 5+ years of experience in deploying and managing Kubernetes in production environments, preferably on bare metal or in complex setups.
GPU Fluency: Hands-on experience with NVIDIA GPU Operators, CUDA tooling, and systems-level configuration for GPU infrastructure.
