GPU & Compute Infrastructure Engineer

Lightning AINew York, New York, United States; Remote; San Francisco, California, United States; Seattle, Washington, United StatesNew

Remote Full-time $180K/yr - $200K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Experience Level

Entry Level

Qualifications

We are looking for candidates with a strong background in infrastructure engineering, specifically in GPU and compute systems. Experience with automation tools, system diagnostics, and validation processes is crucial. The ideal candidate should have a solid understanding of both hardware and software interactions, with a focus on AI/ML and HPC workloads.

About the job

About Us

Lightning AI, the innovative force behind PyTorch Lightning, is revolutionizing the AI landscape since 2019. We provide an all-encompassing platform designed to streamline the development, training, and deployment of AI systems, facilitating the transition from research to production effortlessly.

Following our merger with Voltage Park, a cutting-edge neocloud and AI Factory, we unite developer-centric software with cost-effective, large-scale computing solutions. Our tools are tailored for experimentation, training, and production inference, incorporating built-in security, observability, and control.

We cater to various clients, from individual researchers to startups and large enterprises, operating globally with offices in key cities including New York, San Francisco, Seattle, and London. We're proud to be backed by prestigious investors like Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.

Our Core Values

Move Fast: We prioritize speed and accuracy, breaking down complex challenges into manageable tasks.
Focus: We aim to achieve one goal at a time, working collaboratively to deliver precise features.
Balance: We believe sustained performance comes from adequate rest and recovery, ensuring a healthy work-life balance.
Craftsmanship: We strive for excellence in every detail, taking pride in our work and its impact.
Minimal: We embrace simplicity to drive innovation, eliminating unnecessary complexity and focusing on what truly matters.

Role Overview

We are on the lookout for a GPU & Compute Infrastructure Engineer to become a vital member of our Infrastructure Engineering team. In this pivotal role, you will manage image systems, diagnostics, and validation across expansive bare-metal computing infrastructure, particularly for GPU-optimized systems. You will work at the crossroads of hardware, systems, and software, developing automation, enhancing reliability, and facilitating efficient cluster setups for AI/ML and HPC workloads.

Your responsibilities will include overseeing our image pipeline, running validation environments and test clusters, and supporting GPU hardware qualification. This role is essential for maintaining the integrity of our infrastructure, ensuring consistency, performance, and reliability.

About Lightning AI

Lightning AI is a pioneering technology company specializing in AI systems development, training, and deployment, dedicated to simplifying the research-to-production journey for developers. Our commitment to innovation and excellence drives our mission to empower a diverse range of clients from solo researchers to large enterprises.

1 - 20 of 4,891 Jobs

Search for Product Manager Gpu Infrastructure Npi

4,891 results

Select all on this page (20)

Apply

Product Manager, GPU Infrastructure NPI

Fluidstack

Full-time|$150K/yr - $250K/yr|On-site|San Francisco, CA

About FluidstackFluidstack is at the forefront of building groundbreaking infrastructure designed for the future of intelligence. We collaborate with premier AI research labs, government entities, and leading enterprises like Mistral, Poolside, Black Forest Labs, and Meta to deliver compute solutions at unparalleled speeds.Our mission is to expedite the real…

Mar 3, 2026

Apply

GPU & Compute Infrastructure Engineer

Lightning AI

Full-time|$180K/yr - $200K/yr|Remote|New York, New York, United States; Remote; San Francisco, California, United States; Seattle, Washington, United States

About UsLightning AI, the innovative force behind PyTorch Lightning, is revolutionizing the AI landscape since 2019. We provide an all-encompassing platform designed to streamline the development, training, and deployment of AI systems, facilitating the transition from research to production effortlessly.Following our merger with Voltage Park, a cutting-edge neocloud and AI Factory, we unite developer-centric software with cost-effective, large-scale computing solutions. Our tools are tailored for experimentation, training, and production inference, incorporating built-in security, observability, and control.We cater to various clients, from individual researchers to startups and large enterprises, operating globally with offices in key cities including New York, San Francisco, Seattle, and London. We're proud to be backed by prestigious investors like Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.Our Core ValuesMove Fast: We prioritize speed and accuracy, breaking down complex challenges into manageable tasks.Focus: We aim to achieve one goal at a time, working collaboratively to deliver precise features.Balance: We believe sustained performance comes from adequate rest and recovery, ensuring a healthy work-life balance.Craftsmanship: We strive for excellence in every detail, taking pride in our work and its impact.Minimal: We embrace simplicity to drive innovation, eliminating unnecessary complexity and focusing on what truly matters.Role OverviewWe are on the lookout for a GPU & Compute Infrastructure Engineer to become a vital member of our Infrastructure Engineering team. In this pivotal role, you will manage image systems, diagnostics, and validation across expansive bare-metal computing infrastructure, particularly for GPU-optimized systems. You will work at the crossroads of hardware, systems, and software, developing automation, enhancing reliability, and facilitating efficient cluster setups for AI/ML and HPC workloads.Your responsibilities will include overseeing our image pipeline, running validation environments and test clusters, and supporting GPU hardware qualification. This role is essential for maintaining the integrity of our infrastructure, ensuring consistency, performance, and reliability.

May 1, 2026

Apply

Senior HPC & GPU Infrastructure Engineer

Sciforium

Full-time|On-site|San Francisco

At Sciforium, we are at the forefront of AI infrastructure, pioneering advanced multimodal AI models and an innovative, high-efficiency serving platform. With substantial backing from AMD and a dedicated team of engineers, we are rapidly expanding our capabilities to support the next generation of frontier AI models and real-time applications.About the RoleWe are looking for a highly skilled Senior HPC & GPU Infrastructure Engineer who will be responsible for ensuring the health, reliability, and performance of our GPU compute cluster. As the primary custodian of our high-density accelerator environment, you will serve as the crucial link between hardware operations, distributed systems, and machine learning workflows. This position encompasses a range of responsibilities, from hands-on Linux systems engineering and GPU driver setup to maintaining the ML software stack (CUDA/ROCm, PyTorch, JAX, vLLM). If you are passionate about optimizing hardware performance, enjoy troubleshooting GPUs at scale, and aspire to create world-class AI infrastructure, we would love to hear from you.Your Responsibilities1. System Health & Reliability (SRE)On-Call Response: Be the primary responder for system outages, GPU failures, node crashes, and other cluster-wide incidents, ensuring rapid issue resolution to minimize downtime.Cluster Monitoring: Develop and maintain monitoring protocols for GPU health, thermal behavior, PCIe/NVLink topology issues, memory errors, and general system load.Vendor Liaison: Collaborate with data center personnel, hardware vendors, and on-site technicians for repairs, RMA processing, and physical maintenance of the cluster.2. Linux & Network AdministrationOS Management: Oversee the installation, patching, and maintenance of Linux distributions (Ubuntu / CentOS / RHEL), ensuring consistent configuration, kernel tuning, and automation for large node fleets.Security & Access Controls: Set up VPNs, iptables/firewalls, SSH hardening, and network routing to secure our computing infrastructure.Identity & Storage Management: Manage LDAP/FreeIPA/AD for user identity and administer distributed file systems like NFS, GPFS, or Lustre.3. GPU & ML Stack EngineeringDeployment & Bring-Up: Spearhead the deployment of new GPU nodes, including BIOS configuration and software integration to ensure optimal performance.

Jan 7, 2026

Apply

Software Engineer, GPU Infrastructure - HPC

OpenAI

Full-time|On-site|San Francisco

About Our TeamJoin the Fleet team at OpenAI, where we empower groundbreaking research and product innovation through our advanced computing infrastructure. We manage extensive systems across data centers, GPUs, and networking, ensuring optimal performance, high availability, and efficiency. Our work is crucial in enabling OpenAI’s models to function seamlessly at scale, supporting both our internal research endeavors and external products like ChatGPT. We are committed to prioritizing safety, reliability, and the ethical deployment of AI technology.About the RoleAs a Software Engineer on the Fleet High Performance Computing (HPC) team, you will play a vital role in ensuring the reliability and uptime of OpenAI’s compute fleet. Minimizing hardware failures is essential for smooth research training progress and uninterrupted services, as even minor hardware issues can lead to significant setbacks. With the rise of large supercomputers, the stakes in maintaining efficiency and stability have never been higher.At the cutting edge of technology, we often lead the charge in troubleshooting complex, state-of-the-art systems at scale. This is a unique opportunity for you to engage with groundbreaking technologies and create innovative solutions that enhance the health and efficiency of our supercomputing infrastructure.Our team fosters a culture of autonomy and ownership, enabling skilled engineers to drive meaningful change. In this role, you will focus on comprehensive system investigations and develop automated solutions to enhance our operations. We seek individuals who dive deep into challenges, conduct thorough investigations, and create scalable automation for detection and remediation.Key Responsibilities:Develop and maintain automation systems for provisioning and managing server fleets.Create tools to monitor server health, performance metrics, and lifecycle events.Collaborate effectively with teams across clusters, networking, and infrastructure.Work closely with external operators to maintain a high level of service quality.Identify and resolve performance bottlenecks and inefficiencies in the system.Continuously enhance automation processes to minimize manual intervention.You Will Excel in This Role if You Have:Experience in managing large-scale server environments.A blend of technical skills in systems programming and infrastructure management.Strong problem-solving abilities and a methodical approach to troubleshooting.Familiarity with high-performance computing technologies and tools.

Feb 5, 2026

Apply

Global GPU Commodity Manager

Andromeda Cluster

Full-time|Remote|Global Remote / San Francisco, CA

Location: North America Remote / San Francisco · Full-TimeAbout AndromedaFounded by Nat Friedman and Daniel Gross, Andromeda Cluster provides early-stage startups with access to scaled AI infrastructure, once exclusive to hyperscalers. Our journey began with a single managed cluster that rapidly gained demand, leading us to develop a robust system, network, and orchestration layer to democratize AI infrastructure.Today, we partner with leading AI labs, data centers, and cloud providers to efficiently deliver compute resources wherever needed. Our platform expertly routes training and inference jobs across global supply chains, promoting flexibility and efficiency in one of the fastest-growing markets in the world.Our vision is to create a liquidity layer for global AI compute, and we are on the lookout for bright minds in AI infrastructure, research, and engineering to join our expanding team.The OpportunityWe are seeking a dedicated Global GPU Commodity Manager to enhance the supply and demand matching on our platform. This role is an Individual Contributor position reporting to the Head of Infrastructure. The Infrastructure team is pivotal to our operations, responsible for acquiring and facilitating compute resources across the organization while collaborating closely with compute providers, sales, and technical teams to align supply with demand.With a solid foundation established with our providers, we are now scaling to expand our network and liquidity, broaden our service offerings, and accelerate our growth trajectory.What You'll DoMatch incoming leads from the sales team to internal and external market capacity.Maximize utilization of compute resources.Source and onboard new compute suppliers globally.Identify capacity based on customer requirements and market trends.Resolve customer and supplier challenges in a fast-paced environment.Analyze technical and commercial differences between suppliers to optimize our capacity funnel.Develop a proactive compute strategy driven by market intelligence.Negotiate costs with suppliers and other vendors.Create and implement processes around capacity planning.

Mar 25, 2026

Apply

Go-to-Market Champion for GPU & AI Infrastructure

Impossible Cloud

Full-time|Hybrid|On-site/ Hybrid / Remote

Group: Impossible Cloud / Impossible Cloud Network (ICN)Focus: Integrating Enterprise Storage with Decentralized GPU OrchestrationOur MissionAt Impossible Cloud, we are transforming enterprise storage through our patented decentralized object storage technology, delivering a high-performance, cost-effective infrastructure. We aim to expand this foundation by creating a next-generation AI-first platform that integrates storage, compute, and GPU functionalities.We are looking for a dynamic and hands-on Go-to-Market Champion specializing in AI and GPU Infrastructure to accelerate Impossible Cloud's position in the market for Agentic AI infrastructure. This is an exceptional opportunity to join a rapidly growing AI infrastructure company during a critical phase, owning the GTM strategy from development to scaling a successful sales organization.In this role, you will collaborate closely with founders, Product, Marketing, and Customer Success teams to transform our viral product into a reliable, scalable revenue machine for enterprises. Our culture thrives on relentless innovation, accountability, and ownership, where each team member is dedicated to excellence and urgency in their work.Key Responsibilities- Develop and execute Impossible Cloud’s global Go-to-Market (GTM) strategy, focusing on market segmentation, value propositions, pricing, and packaging for GPU cloud and AI infrastructure tailored to enterprises, startups, and research entities.- Create scalable customer acquisition and retention strategies through direct sales, channels, and partnerships, enhancing commercial enablement and managing the customer journey (both commercial and technical).- Build and lead a high-performing global GTM team encompassing presales, direct sales, partnerships, solutions engineering, marketing, and customer success, while developing playbooks and performance metrics to instill a culture of customer focus and excellence.- Work closely with Product and Engineering to align GTM strategies with the product roadmap, integrating direct customer insights, and gathering market intelligence to anticipate trends in AI and cloud technology adoption.- Identify, negotiate, and lead strategic partnerships with AI firms, ISVs, integrators, and cloud marketplaces, while engaging with Enterprise and AI Native clients as a trusted advisor.

Feb 26, 2026

Apply

Technical Intern - GPU Optimization and AI Infrastructure

Wafer

Internship|On-site|San Francisco

About the RoleWe invite you to join our innovative team at Wafer as a Technical Intern, where you will have the opportunity to shape the future of inference, GPU optimization, and AI infrastructure. As a full-time engineer, you will collaborate closely with our team to define our technical direction and develop the core systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and implement scalable infrastructure for AI model training and inference.Make pivotal technical decisions and influence architectural choices.

Oct 15, 2025

Apply

Director of Commodity Management & NPI

WEKA.io Inc.

Full-time|Remote|San Francisco Bay Area; U.S. Remote

Join WEKA, a trailblazer in AI and accelerated compute workflows, as we redefine data infrastructures with our cutting-edge NeuralMesh™ technology. Unlike conventional storage systems that falter under increasing demands, NeuralMesh™ enhances performance and efficiency as it scales, providing a robust foundation for enterprises and AI innovations. Our technology is trusted by over 30% of Fortune 50 companies and leading industry hyperscalers. If you thrive in a customer-focused, collaborative, and innovative environment, we want you on our team!About the Role:As the Director of Commodity Management & NPI, you will spearhead WEKA's supplier strategy, overseeing cost structures and hardware readiness across critical component categories such as SSDs, memory, and networking. Your leadership will cultivate long-term supplier partnerships, align roadmaps, and develop KPIs to ensure optimal availability, quality, and competitive edge in a fast-evolving market.You will also provide strategic oversight of WEKA's NPI pipeline, collaborating with engineering, product, and operations teams to manage program timelines, mitigate risks, and execute effectively. By leveraging your extensive domain expertise and operational acumen, you will enhance WEKA's market agility and drive hardware innovation, ensuring seamless and efficient product launches.

Dec 16, 2025

Apply

Principal NPI Program Manager

Axon Enterprise, Inc.

Full-time|$210K/yr - $336K/yr|On-site|San Francisco, California, United States

Become a Force for Good with Axon.At Axon, we’re dedicated to the mission of Protecting Life. We are pioneers tackling the most pressing safety and justice challenges through our innovative devices and cloud solutions. Our collaborative spirit drives us to connect through honesty and empathy, embracing diverse perspectives from our customers, communities, and each other.Life at Axon is dynamic, rewarding, and significant. Here, you will take charge and foster genuine change. Continuously evolve while you contribute to a cause that truly matters in an environment that recognizes your contributions.Your ImpactAs the Principal NPI Program Manager, you will be the driving force behind the operational framework that transitions Axon’s products from conception to market launch. You will facilitate the journey from the initial product team SKU request to the point where it becomes available for quotes in our pricing catalog, acting as the key cross-functional coordinator among Product, Finance/FP&A, Commercial Operations, Supply Chain, Operations, Legal, and IT.Your role involves designing, managing, and continuously refining the NPI process, ensuring efficiency, clarity, and speed in a function that bridges product strategy and commercial execution. You will tackle pivotal issues influencing product design and market success, while also liaising with senior and executive leadership across various departments.What You’ll DoLocation: San Francisco, CA (flexible for other Axon hubs)Reports to: Director, Chief of Staff (Controllership)NPI process ownershipOversee and manage the complete NPI workflow for all SKU types (hardware, software, services) from initial submission to cross-functional reviews, and through to the Ready-to-Quote and Ready-to-Ship phases.Enhance and streamline the NPI intake process (ServiceNow) and all related workflows across D365 and Salesforce.Lead the bundle creation process during the annual pricing cycle, collaborating with product teams to capture product vision and manage new and existing bundle updates.Establish and enforce governance frameworks for SKU and bundle creation, modification, pricing adjustments, product discontinuation, and SKU replacements.Clarify roles and priorities during peak periods (pricing weeks, product launches, M&A integrations) ensuring the team is equipped with the necessary tools, training, and support for successful execution.

Mar 27, 2026

Apply

Member of Technical Staff - GPU Infrastructure

Prime Intellect

Full-time|On-site|San Francisco

Join Our Mission to Build Open Superintelligence InfrastructureAt Prime Intellect, we are pioneering the development of an open superintelligence stack that encompasses cutting-edge agentic models and the infrastructure that empowers anyone to create, train, and deploy these advanced AI systems. Our innovative approach aggregates and orchestrates global computational resources into a cohesive control plane, complemented by a comprehensive reinforcement learning (RL) post-training toolkit that includes environments, secure sandboxes, verifiable evaluations, and our asynchronous RL trainer. We provide researchers, startups, and enterprises with the capabilities to execute end-to-end reinforcement learning at unparalleled scale, adapting models to real-world tools, workflows, and deployment scenarios.As a Solutions Architect for GPU Infrastructure, you will be the technical authority responsible for translating customer needs into robust, production-ready systems designed to train the world’s most sophisticated AI models.With a recent funding round raising $15 million (totaling $20 million) led by Founders Fund, alongside contributions from Menlo Ventures and illustrious angels such as Andrej Karpathy (Tesla, OpenAI), Tri Dao (Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), and Emad Mostaque (Stability AI), we are poised for significant growth and innovation.Key Technical ResponsibilitiesThis role requires a blend of deep technical knowledge and hands-on implementation skills. Your contributions will be crucial in:Customer Architecture & DesignCollaborating with clients to comprehend workload specifications and architect optimal GPU cluster solutions.Drafting technical proposals and conducting capacity planning for clusters ranging from 100 to over 10,000 GPUs.Formulating deployment strategies for large language model (LLM) training, inference, and high-performance computing (HPC) tasks.Delivering architectural recommendations to both technical teams and executive stakeholders.Infrastructure Deployment & OptimizationImplementing and configuring orchestration frameworks such as SLURM and Kubernetes for distributed workloads.Establishing high-performance networking through InfiniBand, RoCE, and NVLink interconnects.Enhancing GPU utilization, memory management, and inter-node communication.Setting up parallel file systems (Lustre, BeeGFS, GPFS) to maximize I/O efficiency.Tuning system performance, from kernel parameters to CUDA configurations.Production Operations & SupportEnsuring the reliability and performance of GPU infrastructure through continuous monitoring and support.Collaborating with cross-functional teams to troubleshoot and optimize operational workflows.Documenting processes and creating training materials for team members and clients.

Aug 30, 2025

Apply

Technical Internship in AI and GPU Optimization

Wafer

Internship|On-site|San Francisco

About the RoleWe're excited to invite you to join wafer as a Spring Intern, where you will play a crucial role in shaping the future of AI infrastructure and GPU optimization. As part of our innovative team, you will work closely with full-time engineers to define our technical strategies and contribute to the development of the essential systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and implement scalable infrastructure for AI model training and inference tasks.Guide the team in making technical decisions and architectural choices.Qualifications We SeekEssential Technical SkillsGPU Fundamentals: A strong grasp of GPU architectures, CUDA programming, and parallel computing methodologies.Deep Learning Frameworks: Skilled in PyTorch, TensorFlow, or JAX, especially for GPU-accelerated applications.Knowledge of LLM/AI: Solid foundation in large language models, including training, fine-tuning, prompting, and evaluation.Systems Engineering: Proficient in C++, Python, and potentially Rust/Go for developing tools around CUDA.Preferred BackgroundPublications or contributions to open-source projects related to inference GPU computing or ML/AI are advantageous.Hands-on experience in conducting large-scale experiments, benchmarking, and performance optimization.

Oct 15, 2025

Apply

Product Manager - Infrastructure

Sierra

Full-time|On-site|San Francisco, CA

Join Sierra as a Product Manager specializing in Infrastructure, where you will lead the development and execution of innovative infrastructure solutions. This role is crucial for enhancing our product offerings and driving strategic initiatives that align with our business goals. You will collaborate with cross-functional teams to deliver high-quality products that meet customer needs and ensure optimal performance.

May 1, 2026

Apply

Technical Staff Engineer - GPU Optimization at Wafer | San Francisco

Wafer

Full-time|On-site|San Francisco

About the PositionAt Wafer, we are on a mission to enhance the intelligence per watt by developing AI systems that can self-optimize. Our journey begins with GPU kernels, and we aim to revolutionize every aspect of ML systems and AI infrastructure. We are a compact, dynamic team of four, supported by renowned investors including Fifty Years, Y Combinator, Jeff Dean, and Woj Zaremba, co-founder of OpenAI. We are seeking passionate engineers eager to innovate at the convergence of AI agents and systems programming.In this role, you will collaborate closely with our founding team to create the systems that power our GPU optimization platform. Your projects will range from the agent framework that refines kernels to the profiling infrastructure that interfaces with NCU and ROCprofiler, as well as the compiler tools that scrutinize PTX and SASS.

Feb 4, 2026

Apply

Software Engineer - GPU Inference at Baseten | San Francisco

Baseten

Full-time|On-site|San Francisco

Baseten develops infrastructure and tools that help AI companies deploy and scale inference. Teams at organizations like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer rely on Baseten to bring advanced machine learning models into production. The company recently secured a $300M Series E from investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Role overview This Software Engineer - GPU Inference position joins the founding team for Baseten Voice AI in San Francisco. The team focuses on building production-ready Voice AI systems, bringing open-source voice models into real-world use for clients in productivity, customer service, healthcare conversations, and education. The work shapes how people interact with technology through voice, creating broad impact across industries. In this role, the engineer leads the internal inference stack that powers Voice AI models. Responsibilities include guiding the product roadmap and driving engineering execution. Collaboration is a key part of the job, working closely with Forward Deployed Engineers, Model Performance Engineers, and other technical groups to advance Voice AI capabilities. Sample projects and initiatives The world's fastest Whisper, with streaming and diarization Canopy Labs selects Baseten for Orpheus TTS inference Partnering with the Core Product team to build an orchestration framework for a multi-model voice agent Working with the Training Platform team to support continuous training of voice models Designing a developer-friendly API and SDK for self-service adoption of Baseten Voice AI products

Apr 26, 2026

Apply

Product Manager, API Infrastructure

OpenAI

Full-time|Remote|San Francisco

Role Overview OpenAI is looking for a Product Manager focused on API Infrastructure to help guide the direction of its API products. This position is based in San Francisco and works at the intersection of engineering, design, and business functions. What You Will Do Work with engineering, design, and business teams to set and execute the product roadmap for API infrastructure. Gather requirements and define user stories that reflect the needs of developers and businesses using OpenAI's APIs. Prioritize features by considering user feedback and business impact. Lead cross-functional teams and keep communication clear between all stakeholders to help products succeed.

Apr 17, 2026

Apply

GPU Performance Engineer

Genmo

Full-time|On-site|San Francisco HQ

At Genmo, we are at the forefront of advancing artificial intelligence through innovative research in video generation. Our mission is to construct open, cutting-edge models that will ultimately contribute to the realization of Artificial General Intelligence (AGI). As part of our dynamic team, you will play a pivotal role in redefining the future of AI and expanding the horizons of video creation.We are looking for a skilled GPU Performance Engineer who can extract maximum performance from our H100 infrastructure and fine-tune our model serving stack to achieve unparalleled efficiency. If you are passionate about optimizing performance, particularly at the microsecond level, and thrive on pushing hardware to its limits, this is the perfect opportunity for you.Key ResponsibilitiesUtilize advanced profiling tools such as Nsight Systems and nvprof to analyze and enhance GPU workloads.Develop high-performance CUDA and Triton kernels to optimize essential model functions.Reduce cold start latency from seconds to mere milliseconds in our serving infrastructure.Optimize memory access patterns, implement kernel fusion, and maximize GPU utilization.Collaborate closely with machine learning engineers to optimize model implementations.Diagnose and resolve performance issues throughout the application and hardware stack.Implement custom memory pooling and allocation strategies to enhance performance.Promote performance optimization techniques and foster a culture of excellence across teams.

Jul 17, 2025

Apply

GPU Kernel Engineer

Baseten

Full-time|On-site|San Francisco

ABOUT BASETENAt Baseten, we empower the world's leading AI firms—such as Cursor, Notion, and OpenEvidence—by delivering mission-critical inference solutions. Our unique blend of applied AI research, robust infrastructure, and user-friendly developer tools enables AI pioneers to effectively deploy groundbreaking models. With our recent achievement of a $300M Series E funding round supported by esteemed investors like BOND and IVP, we're on an exciting growth trajectory. Join our dynamic team and contribute to the platform that drives the next generation of AI products.THE ROLEWe are looking for an experienced Senior GPU Kernel Engineer to join our innovative team at the forefront of AI acceleration. In this role, your programming expertise will directly enhance the performance of cutting-edge machine learning models. You'll be responsible for developing highly efficient GPU kernels that optimize computational processes, allowing for transformative AI applications.You'll thrive in a fast-paced, intellectually challenging environment where your technical skills are pivotal. Your contributions will directly affect production systems that serve millions of users across various platforms. This position offers exceptional opportunities for career advancement for engineers enthusiastic about low-level optimization and impactful systems engineering.EXAMPLE INITIATIVESAs part of our Model Performance team, you will engage in projects like:Baseten Embeddings Inference: The quickest embeddings solution availableThe Baseten Inference StackEnhancing model performance optimizationRESPONSIBILITIESCore Engineering ResponsibilitiesDesign and develop high-performance GPU kernels for essential machine learning operations, including matrix multiplications and attention mechanisms.Collaborate with cross-functional teams to drive performance improvements and implement optimizations.Debug and refine kernel code to achieve maximal efficiency and reliability.Stay abreast of the latest advancements in GPU technology and machine learning frameworks.

Jul 17, 2025

Apply

Director of Product Management - Cloud Infrastructure

Okta, Inc.

Full-time|$230K/yr - $355K/yr|On-site|San Francisco, California

Secure Every Identity, from AI to HumanAt Okta, we believe that identity is the cornerstone of unlocking AI's potential. Our mission is to build a trusted, neutral infrastructure that empowers organizations to safely navigate this new era. This mission demands relentless problem-solving for complex challenges that have real-world implications. We seek exceptional builders and owners who act with speed and urgency, executing with unwavering excellence.This is your chance to engage in career-defining work. If you share our commitment to this mission, we want to hear from you.We are currently seeking a Director of Product Management to spearhead our strategy concerning Okta’s trust motion. This includes overseeing service performance, deployments, resiliency, scalability, operational efficiency, geographic availability, compliance, and more. This pivotal leader will manage our international and U.S. public sector offerings, including our U.S. Federal business, as well as our cloud service provider strategy. Additionally, this role encompasses the stewardship of Okta’s Data Platform, which delivers analytics, in-product reporting frameworks, machine learning, and messaging capabilities for both internal and external customers.If you are enthusiastic about the intersection of infrastructure and large-scale enterprise products, we want you on our team.

Mar 25, 2026

Apply

GPU Kernel Engineer

Sciforium

Full-time|On-site|San Francisco

At Sciforium, we are at the forefront of AI infrastructure, innovating next-generation multimodal AI models and a proprietary high-efficiency serving platform. With substantial funding and direct collaboration from AMD, supported by their engineers, our team is rapidly expanding to develop the complete stack that powers cutting-edge AI models and real-time applications.About the RoleWe are on the lookout for a talented GPU Kernel Engineer who is eager to explore and maximize performance on modern accelerators. In this role, you will be responsible for designing and optimizing custom GPU kernels that drive our advanced large-scale AI systems. You will navigate the hardware-software stack, engaging in low-level kernel development and integrating optimized operations into high-level machine learning frameworks for large-scale training and inference.This position is perfect for someone who excels at the intersection of GPU programming, systems engineering, and state-of-the-art AI workloads, and aims to contribute significantly to the efficiency and scalability of our machine learning platform.Key ResponsibilitiesDevelop, implement, and enhance custom GPU kernels utilizing C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas.Profile and fine-tune the end-to-end performance of machine learning operations, particularly for large-scale LLM training and inference.Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and our proprietary internal runtimes.Create performance models, pinpoint bottlenecks, and deliver kernel-level enhancements that significantly boost AI workloads.Collaborate with machine learning researchers, distributed systems engineers, and model-serving teams to optimize computational performance across the entire stack.Engage closely with hardware vendors (NVIDIA/AMD) and stay updated on the latest GPU architecture and compiler/toolchain advancements.Contribute to the development of tools, documentation, benchmarking suites, and testing frameworks ensuring correctness and performance reproducibility.Must-Haves5+ years of industry or research experience in GPU kernel development or high-performance computing.Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related discipline.Strong programming proficiency in C++, Python, and familiarity with machine learning frameworks.

Dec 6, 2025

Apply

Technical Staff Member - Supercomputing Platform & Infrastructure

magic.dev

Full-time|On-site|San Francisco

At Magic, our mission is to create safe AGI that propels humanity forward in addressing the world’s most critical challenges. We believe that the key to achieving safe AGI lies in automating research and code generation to enhance models and resolve alignment issues more effectively than humans alone. Our unique approach integrates frontier-scale pre-training, domain-specific reinforcement learning, ultra-long context, and inference-time computation to realize this vision.Role OverviewAs a vital member of our Supercomputing Platform & Infrastructure team, you will be instrumental in designing, constructing, and managing the extensive GPU infrastructure that underpins Magic’s model training and inference processes.A key aspect of your role will involve leveraging Terraform-driven infrastructure-as-code methodologies to build and maintain our infrastructure, ensuring reproducibility, reliability, and operational clarity across clusters comprising thousands of GPUs.Magic’s long-context models exert continuous demands on compute, networking, and storage systems. The infrastructure must support long-running distributed jobs, high-throughput data movement, and stringent availability requirements, necessitating designs that are automated, observable, and resilient. You will take ownership of the systems and IaC foundations that facilitate these capabilities.This position has the potential to expand into broader responsibilities encompassing supercomputing platform architecture, influencing how Magic scales GPU clusters and enhances infrastructure reliability as model workloads expand.Key ResponsibilitiesDesign and manage large-scale GPU clusters for model training and inference.Construct and sustain infrastructure utilizing Terraform across both cloud and hybrid environments.Develop modular, scalable IaC frameworks for provisioning compute, networking, and storage resources.Enhance deployment reproducibility, maintain environment consistency, and ensure operational safety.Optimize networking and storage architectures for high-throughput AI workloads.Automate fault detection and recovery mechanisms across distributed clusters.Diagnose complex cross-layer issues involving hardware, drivers, networking, storage, operating systems, and cloud environments.Enhance observability, monitoring, and reliability of essential platform systems.QualificationsStrong foundation in systems engineering principles.Extensive hands-on experience with Terraform, including module design, state management, environment isolation, and large-scale implementations.

Jan 25, 2024

Create account — see all 4,891 results

GPU & Compute Infrastructure Engineer

Experience Level

Qualifications

About the job

About Us

Our Core Values

Role Overview

About Lightning AI

Similar jobs