Lightning AI logo

GPU & Compute Infrastructure Engineer

Lightning AINew York, New York, United States; Remote; San Francisco, California, United States; Seattle, Washington, United StatesNew
Remote Full-time $180K/yr - $200K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Entry Level

Qualifications

We are looking for candidates with a strong background in infrastructure engineering, specifically in GPU and compute systems. Experience with automation tools, system diagnostics, and validation processes is crucial. The ideal candidate should have a solid understanding of both hardware and software interactions, with a focus on AI/ML and HPC workloads.

About the job

About Us

Lightning AI, the innovative force behind PyTorch Lightning, is revolutionizing the AI landscape since 2019. We provide an all-encompassing platform designed to streamline the development, training, and deployment of AI systems, facilitating the transition from research to production effortlessly.

Following our merger with Voltage Park, a cutting-edge neocloud and AI Factory, we unite developer-centric software with cost-effective, large-scale computing solutions. Our tools are tailored for experimentation, training, and production inference, incorporating built-in security, observability, and control.

We cater to various clients, from individual researchers to startups and large enterprises, operating globally with offices in key cities including New York, San Francisco, Seattle, and London. We're proud to be backed by prestigious investors like Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.

Our Core Values

  • Move Fast: We prioritize speed and accuracy, breaking down complex challenges into manageable tasks.

  • Focus: We aim to achieve one goal at a time, working collaboratively to deliver precise features.

  • Balance: We believe sustained performance comes from adequate rest and recovery, ensuring a healthy work-life balance.

  • Craftsmanship: We strive for excellence in every detail, taking pride in our work and its impact.

  • Minimal: We embrace simplicity to drive innovation, eliminating unnecessary complexity and focusing on what truly matters.

Role Overview

We are on the lookout for a GPU & Compute Infrastructure Engineer to become a vital member of our Infrastructure Engineering team. In this pivotal role, you will manage image systems, diagnostics, and validation across expansive bare-metal computing infrastructure, particularly for GPU-optimized systems. You will work at the crossroads of hardware, systems, and software, developing automation, enhancing reliability, and facilitating efficient cluster setups for AI/ML and HPC workloads.

Your responsibilities will include overseeing our image pipeline, running validation environments and test clusters, and supporting GPU hardware qualification. This role is essential for maintaining the integrity of our infrastructure, ensuring consistency, performance, and reliability.

About Lightning AI

Lightning AI is a pioneering technology company specializing in AI systems development, training, and deployment, dedicated to simplifying the research-to-production journey for developers. Our commitment to innovation and excellence drives our mission to empower a diverse range of clients from solo researchers to large enterprises.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.