companyVercept logo

Backend Engineer – Inference Optimization

VerceptSeattle
On-site Full-time $150/yr - $250/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

Essential Qualifications:Extensive experience in optimizing model inference pipelines, including model quantization and KV caching. Strong proficiency in backend systems and high-performance programming languages (Python, C++, or Rust). Familiarity with distributed serving, GPU acceleration, and large-scale system architectures. Proven ability to debug complex performance issues across model, runtime, and hardware layers. Adaptability to work in fast-paced environments with ambitious technical objectives. Preferred Qualifications:Practical experience with vLLM or similar inference frameworks. Background in GPU kernel optimization (CUDA, Triton, ROCm). Experience in scaling inference across multi-node or heterogeneous clusters. Prior involvement in model compilation (e.g., TensorRT, TVM, ONNX Runtime). Hands-on experience with model quantization strategies.

About the job

About Us

At Vercept, we are an energetic and mission-focused team with a proven history of academic excellence. Our talented researchers have made significant contributions to the field of artificial intelligence, receiving accolades such as best paper awards at leading AI conferences and achieving remarkable citation rankings. We are committed to pioneering transformative research that sets new standards in the industry and aim to revolutionize the world—one innovative breakthrough at a time.

What We Seek & Why You Should Join Us

We are in search of a Backend Engineer specializing in Inference Optimization who is passionate about tackling some of the most challenging systems issues in AI. In this role, you will focus on enhancing the performance of foundation model inference, operating at the cutting edge of machine learning and high-performance systems engineering. This is an exciting opportunity to establish new standards for latency, throughput, and efficiency on a large scale.

Role Overview

As a Backend Engineer, you will take ownership of the design and optimization of inference pipelines for large-scale models. Collaborating closely with researchers and infrastructure engineers, you will identify bottlenecks and implement advanced techniques such as quantization and KV caching, ensuring the deployment of high-performance serving systems in production. Your contributions will directly influence how swiftly and cost-effectively users engage with next-generation AI.

What We Expect From You

Essential Qualifications:

  • Extensive experience in optimizing model inference pipelines, including model quantization and KV caching.

  • Strong proficiency in backend systems and high-performance programming languages (Python, C++, or Rust).

  • Familiarity with distributed serving, GPU acceleration, and large-scale system architectures.

  • Proven ability to debug complex performance issues across model, runtime, and hardware layers.

  • Adaptability to work in fast-paced environments with ambitious technical objectives.

Preferred Qualifications:

  • Practical experience with vLLM or similar inference frameworks.

  • Background in GPU kernel optimization (CUDA, Triton, ROCm).

  • Experience in scaling inference across multi-node or heterogeneous clusters.

  • Prior involvement in model compilation (e.g., TensorRT, TVM, ONNX Runtime).

  • Hands-on experience with model quantization strategies.

About Vercept

Vercept is a dynamic organization at the forefront of AI research and development. Our team comprises leading experts who have made remarkable contributions to the AI field, driving innovation and excellence.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.