About the job

At Sciforium, we are at the forefront of AI infrastructure, innovating next-generation multimodal AI models and a proprietary high-efficiency serving platform. With substantial funding and direct collaboration from AMD, supported by their engineers, our team is rapidly expanding to develop the complete stack that powers cutting-edge AI models and real-time applications.

About the Role

We are on the lookout for a talented GPU Kernel Engineer who is eager to explore and maximize performance on modern accelerators. In this role, you will be responsible for designing and optimizing custom GPU kernels that drive our advanced large-scale AI systems. You will navigate the hardware-software stack, engaging in low-level kernel development and integrating optimized operations into high-level machine learning frameworks for large-scale training and inference.

This position is perfect for someone who excels at the intersection of GPU programming, systems engineering, and state-of-the-art AI workloads, and aims to contribute significantly to the efficiency and scalability of our machine learning platform.

Key Responsibilities

Develop, implement, and enhance custom GPU kernels utilizing C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas.
Profile and fine-tune the end-to-end performance of machine learning operations, particularly for large-scale LLM training and inference.
Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and our proprietary internal runtimes.
Create performance models, pinpoint bottlenecks, and deliver kernel-level enhancements that significantly boost AI workloads.
Collaborate with machine learning researchers, distributed systems engineers, and model-serving teams to optimize computational performance across the entire stack.
Engage closely with hardware vendors (NVIDIA/AMD) and stay updated on the latest GPU architecture and compiler/toolchain advancements.
Contribute to the development of tools, documentation, benchmarking suites, and testing frameworks ensuring correctness and performance reproducibility.

Must-Haves

5+ years of industry or research experience in GPU kernel development or high-performance computing.
Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related discipline.
Strong programming proficiency in C++, Python, and familiarity with machine learning frameworks.

About the job

About the Role

Key Responsibilities

Develop, implement, and enhance custom GPU kernels utilizing C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas.
Profile and fine-tune the end-to-end performance of machine learning operations, particularly for large-scale LLM training and inference.
Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and our proprietary internal runtimes.
Create performance models, pinpoint bottlenecks, and deliver kernel-level enhancements that significantly boost AI workloads.
Collaborate with machine learning researchers, distributed systems engineers, and model-serving teams to optimize computational performance across the entire stack.
Engage closely with hardware vendors (NVIDIA/AMD) and stay updated on the latest GPU architecture and compiler/toolchain advancements.
Contribute to the development of tools, documentation, benchmarking suites, and testing frameworks ensuring correctness and performance reproducibility.

Must-Haves

5+ years of industry or research experience in GPU kernel development or high-performance computing.
Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related discipline.
Strong programming proficiency in C++, Python, and familiarity with machine learning frameworks.

About the Role

Key Responsibilities

Develop, implement, and enhance custom GPU kernels utilizing C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas.

Profile and fine-tune the end-to-end performance of machine learning operations, particularly for large-scale LLM training and inference.

Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and our proprietary internal runtimes.

Create performance models, pinpoint bottlenecks, and deliver kernel-level enhancements that significantly boost AI workloads.

Collaborate with machine learning researchers, distributed systems engineers, and model-serving teams to optimize computational performance across the entire stack.

Engage closely with hardware vendors (NVIDIA/AMD) and stay updated on the latest GPU architecture and compiler/toolchain advancements.

Contribute to the development of tools, documentation, benchmarking suites, and testing frameworks ensuring correctness and performance reproducibility.

Must-Haves

5+ years of industry or research experience in GPU kernel development or high-performance computing.

Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related discipline.

Strong programming proficiency in C++, Python, and familiarity with machine learning frameworks.

About the Role

Key Responsibilities

Develop, implement, and enhance custom GPU kernels utilizing C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas.

Profile and fine-tune the end-to-end performance of machine learning operations, particularly for large-scale LLM training and inference.

Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and our proprietary internal runtimes.

Create performance models, pinpoint bottlenecks, and deliver kernel-level enhancements that significantly boost AI workloads.

Collaborate with machine learning researchers, distributed systems engineers, and model-serving teams to optimize computational performance across the entire stack.

Engage closely with hardware vendors (NVIDIA/AMD) and stay updated on the latest GPU architecture and compiler/toolchain advancements.

Contribute to the development of tools, documentation, benchmarking suites, and testing frameworks ensuring correctness and performance reproducibility.

Must-Haves

5+ years of industry or research experience in GPU kernel development or high-performance computing.

Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related discipline.

Strong programming proficiency in C++, Python, and familiarity with machine learning frameworks.

GPU Kernel Engineer

Unlock Your Potential

Experience Level

Qualifications

About the job

About the Role

Key Responsibilities

Must-Haves

About Sciforium

Team Leader at Greene King | Greenwich

Team Leader at Greene King | Soho

Talent Partner - In-house Recruiter at ennovationHUB | Barcelona

Part-Time Chef at Greene King | Walnut Tree

Part-Time Chef at Greene King | Walnut Tree

Cooks

Desk Investigator Officer

Senior Scheduling Coordinator

Commercial Cleaner

Anti-Fraud Officer - Transaction Monitoring

Project Management Information Systems Specialist

Merchant Relations Officer - Bekasi

Associate Talent Acquisition - 6 Month Contract

Bar & Waiting Staff at Greene King | Chichester

Bar and Waiting Staff at Greene King | Chichester

Door Attendant at Raffles The Red Sea | Umluj

Front Office Supervisor at Raffles The Red Sea | Umluj

Senior Finance Manager

Guest Relations Supervisor - Raffles The Red Sea (Saudi National)

People & Culture Assistant

GPU Kernel Engineer

Unlock Your Potential

Experience Level

Qualifications

About the job

About the Role

Key Responsibilities

Must-Haves

About Sciforium

GPU Kernel Engineer

Unlock Your Potential

Experience Level

Qualifications

About the job

About the Role

Key Responsibilities

Must-Haves

About Sciforium

GPU Kernel Engineer

Unlock Your Potential

Experience Level

Qualifications

About the job

About the Role

Key Responsibilities

Must-Haves

About Sciforium