companyContextual AI logo

Technical Staff Member - LLM Systems & Performance

Contextual AIMountain View, CA
On-site Full-time $170K/yr - $200K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

We are looking for candidates who possess:A solid foundation in AI and machine learning, particularly with LLM systems. Experience in optimizing machine learning pipelines, especially in the context of SFT and RL. Proficiency with performance analysis tools and techniques for high-throughput systems. Familiarity with GPU programming, particularly using CUDA or Triton. Strong collaborative skills to work effectively within a team of researchers and engineers.

About the job

About Contextual AI

At Contextual AI, we are at the forefront of transforming how AI agents operate by addressing one of the most significant challenges in the field: context. By providing the right context at the right moment, we enable enterprises to achieve the accuracy and scalability required for effective AI deployment. Our innovative enterprise AI development platform bridges cutting-edge AI research with the practical needs of developers, allowing them to seamlessly ingest, query, and integrate data from various enterprise sources into their workflows.

Founded by the pioneers of Retrieval-Augmented Generation (RAG), our technology forms the backbone of the context layer that connects foundational AI models with relevant real-time information. Supported by visionary venture capital, we are not merely participating in the enterprise AI revolution; we are leading it. Join us as we create a future in which AI not only answers queries but also revolutionizes business operations.

About the Role

As a Member of Technical Staff focused on LLM Systems & Performance, you will join a dedicated and impactful team responsible for building and optimizing LLM systems from end-to-end. Your work will range from developing Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) pipelines to creating high-throughput inference clusters for production environments. You will collaborate with both researchers and engineers to design advanced models and the supporting infrastructure for our context layer.

What You'll Do

  • Enhance and optimize components of our SFT and RL training pipelines (e.g., Verl, SkyRL), focusing on areas such as data loading, training loops, logging, and evaluation.
  • Contribute to the development of LLM inference infrastructure (e.g., vLLM, SGLang), including optimizations for batching, KV-cache management, scheduling, and serving.
  • Utilize profiling tools like Nsight to analyze and improve end-to-end performance metrics (throughput, latency, compute/memory/bandwidth) by identifying and resolving bottlenecks.
  • Engage with distributed training and inference systems using technologies such as NCCL, NVLink, and various parallelism strategies on multi-GPU clusters.
  • Assist in experimenting with and implementing quantization techniques (e.g., INT8, FP8, FP4, mixed-precision) for both training and inference.
  • Write and optimize GPU kernels utilizing CUDA or Triton, employing techniques such as FlashAttention and Tensor Cores as appropriate.
  • Collaborate with researchers to advance ideas from concept to prototype, through scaled experiments, and into production.

About Contextual AI

Contextual AI is pioneering the future of AI by addressing the vital challenge of context in AI systems. Our comprehensive AI development platform combines innovative research with practical applications, empowering developers to seamlessly integrate advanced AI capabilities into business processes.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.