companyBaseten logo

Software Engineer, Model Performance Tooling

BasetenVancouver
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

Bachelor’s degree in Computer Science, Engineering, or a related field. Familiarity with high-performance computing and large-scale machine learning frameworks. Experience in programming languages such as Python, C++, or Java. Understanding of GPU architecture and performance metrics. Strong analytical skills with a problem-solving mindset. Ability to work collaboratively in a fast-paced, dynamic environment.

About the job

ABOUT BASETEN

At Baseten, we empower AI innovators by providing mission-critical inference solutions for some of the most dynamic companies in the field, including Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our unique blend of applied AI research, adaptable infrastructure, and intuitive developer tools allows organizations at the forefront of AI to deploy state-of-the-art models efficiently. With rapid growth and a recent $300M Series E funding round led by prominent investors like BOND, IVP, Spark Capital, Greylock, and Conviction, we are on an exciting journey. Join us in shaping the platform that engineers rely on to launch AI products successfully.

THE OPPORTUNITY

We are actively seeking early-career Software Engineers to join our dynamic team in Vancouver, BC. This specialized position merges high-performance computing (HPC) with Large Language Model (LLM) engineering. You'll take charge of creating an automated suite of tools designed to diagnose and enhance our next-generation AI infrastructure.

In this role, you will delve deep into model performance, breaking down systems to analyze their efficiency at the hardware level. You will develop tools for measuring GPU FLOPS, stress-testing InfiniBand clusters, and establishing the benchmarks necessary for production readiness.

RESPONSIBILITIES

  • Performance Benchmarking: Automate and execute standard LLM quality benchmarks (GSM8K, MMLU) alongside tailored performance suites for specific workloads, including long-context windows and KV cache reuse.

  • Infrastructure Validation: Design and implement automated acceptance tests for new GPU clusters across both x86 and ARM systems, evaluating GPU memory bandwidth, networking throughput, and multi-node networking performance.

  • Model Development Experience: Create and maintain internal GPU-enabled development environments akin to GitHub Codespaces, ensuring the team has access to high-performance "dev machines" optimized for model experimentation.

  • Tool Development: Contribute to and enhance tools such as InferenceMAX and genai-bench to automate model evaluation and optimization processes.

  • Deep Hardware Profiling: Utilize PyTorch Profiler and NVIDIA Nsight Systems to gather performance profiles, pinpoint bottlenecks, and debug NVIDIA compute/networking issues.

About Baseten

Baseten is at the forefront of AI infrastructure, enabling some of the world's most innovative companies to deploy advanced models efficiently. With a strong focus on research, development, and state-of-the-art tools, we foster an environment where creativity and technical excellence thrive. Our commitment to pushing the boundaries of AI technology is backed by significant investment and a dedicated team.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.