About the job
Location: Austin, Texas area / On-site preferred
Project: 7MW Phase I AI Datacenter -> 50MW Campus Expansion
Reports to: Founders / Executive Team
About the Project
We are in the process of constructing a cutting-edge, high-density AI datacenter campus situated just outside of Austin, Texas, commencing with an initial capacity of approximately 7MW of NVIDIA GB300 NVL72 infrastructure and anticipating a scale-up to over 50MW. Our primary aim is to focus on real-time inference, reasoning, and high-value AI serving workloads, effectively monetizing our infrastructure in active markets rather than merely leasing out space.
This role transcends traditional datacenter operations.
We are seeking a visionary leader who will strategically transform our GPU racks into a profitable inference operation.
As the head of this initiative, you will be responsible for defining and executing strategies that enhance revenue, uptime, and utilization through careful selection of models, orchestration stacks, pricing strategies, customer segments, and marketplace partnerships.
The ideal candidate will appreciate that the essence of our business lies beyond mere computation; rather, it encompasses monetized tokens, latency-adjusted utilization, and gross margins.
The Role
We are in search of a senior operator-builder capable of bridging multiple domains:
AI infrastructure
Inference performance engineering
Model serving and routing
Marketplace monetization
Customer and partner integration
Revenue optimization
You will architect and manage the inference platform that dictates how our GB300 NVL72 racks are monetized in real-time. This could involve direct enterprise workloads, marketplace distribution, API-based reselling, model hosting, fine-tuned/private deployments, and novel inference channels.
You should possess a keen understanding of profitable applications on modern inference hardware, and be prepared to answer critical questions such as:
Which open-weight and commercially viable models should be prioritized on this hardware?
How should workloads be balanced across premium low-latency serving, bulk throughput, reserved capacity, and experimental capacities?
Should we leverage third-party marketplaces for routing?

