companySciforium logo

GPU Kernel Engineer

SciforiumSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

5+ years of experience in GPU kernel development or high-performance computing. Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related field. Proficient in C++, Python, with a good understanding of machine learning frameworks.

About the job

At Sciforium, we are at the forefront of AI infrastructure, innovating next-generation multimodal AI models and a proprietary high-efficiency serving platform. With substantial funding and direct collaboration from AMD, supported by their engineers, our team is rapidly expanding to develop the complete stack that powers cutting-edge AI models and real-time applications.

About the Role

We are on the lookout for a talented GPU Kernel Engineer who is eager to explore and maximize performance on modern accelerators. In this role, you will be responsible for designing and optimizing custom GPU kernels that drive our advanced large-scale AI systems. You will navigate the hardware-software stack, engaging in low-level kernel development and integrating optimized operations into high-level machine learning frameworks for large-scale training and inference.


This position is perfect for someone who excels at the intersection of GPU programming, systems engineering, and state-of-the-art AI workloads, and aims to contribute significantly to the efficiency and scalability of our machine learning platform.

Key Responsibilities

  • Develop, implement, and enhance custom GPU kernels utilizing C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas.

  • Profile and fine-tune the end-to-end performance of machine learning operations, particularly for large-scale LLM training and inference.

  • Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and our proprietary internal runtimes.

  • Create performance models, pinpoint bottlenecks, and deliver kernel-level enhancements that significantly boost AI workloads.

  • Collaborate with machine learning researchers, distributed systems engineers, and model-serving teams to optimize computational performance across the entire stack.

  • Engage closely with hardware vendors (NVIDIA/AMD) and stay updated on the latest GPU architecture and compiler/toolchain advancements.

  • Contribute to the development of tools, documentation, benchmarking suites, and testing frameworks ensuring correctness and performance reproducibility.

Must-Haves

  • 5+ years of industry or research experience in GPU kernel development or high-performance computing.

  • Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related discipline.

  • Strong programming proficiency in C++, Python, and familiarity with machine learning frameworks.

About Sciforium

Sciforium is an innovative AI infrastructure company dedicated to developing advanced multimodal AI models and a proprietary high-efficiency serving platform, backed by significant funding and collaboration with AMD engineers.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.