vcluster logovcluster logo

AI Infrastructure Specialist

vclusterRemote- US
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

To thrive in this role, you should possess:Proven Expertise: Extensive experience with Kubernetes and GPU infrastructure. Technical Acumen: Strong troubleshooting skills and a solid understanding of networking and storage solutions. Collaborative Spirit: Excellent communication skills to work effectively with customer teams and internal stakeholders.

About the job

Join vCluster as an AI Infrastructure Specialist, where you will engage directly with clients at a pivotal stage in their journey, from configuring bare metal GPU nodes to deploying production-ready solutions. This role transcends typical professional services; you will operate in a pre-sales capacity, focusing on proof of value engagements that lead to robust production environments. You'll be among the first technical contacts for our neocloud and AI Factory clients, and the playbooks you create will streamline the onboarding process for future hires and clients.

vCluster is rapidly gaining recognition in the GPU AI Cloud sphere, catering to enterprises that are building AI Factories. Our clients require the swift implementation of Kubernetes as a managed service on bare metal GPU infrastructure, and your expertise will be crucial in making this a reality.

Key Responsibilities:

  • Lead Technical Deployments: Oversee comprehensive technical deployments for GPU neocloud and AI Factory customers, starting from bare metal configuration to establishing a validated vCluster environment.

  • Infrastructure Optimization: Set up and troubleshoot bare metal GPU node infrastructure, including CNI configuration, GPU Operator setup, distributed storage backends, and RDMA/InfiniBand.

  • Validation: Implement and validate Kubernetes and vCluster to deliver GPU-powered managed Kubernetes solutions.

  • Knowledge Transfer: Collaborate with customer teams to foster self-sufficiency, ensuring they can independently manage and expand their platform.

  • Scaling through Documentation: Create and document reusable playbooks and deployment architectures, enabling future clients to benefit from your insights.

  • Feedback Loop: Partner with Engineering and Product teams to identify recurring infrastructure challenges, providing valuable insights that inform our product roadmap.

  • Strategic Partnering: Support the Sales team in pre-sales efforts where in-depth infrastructure work is essential for proving value.

Your Profile:

  • Production K8s Mastery: 5+ years of experience in deploying and managing Kubernetes in production environments, preferably on bare metal or in complex setups.

  • GPU Fluency: Hands-on experience with NVIDIA GPU Operators, CUDA tooling, and systems-level configuration for GPU infrastructure.

About vcluster

vCluster is at the forefront of GPU AI Cloud innovation, empowering businesses to establish AI Factories. We specialize in delivering Kubernetes as a managed service on bare metal GPU infrastructure, enabling our clients to accelerate their AI initiatives and drive operational efficiencies.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.