Perplexity logo

AI Inference Engineer at Perplexity | London

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

Required QualificationsProficient in ML systems and deep learning frameworks such as PyTorch, TensorFlow, and ONNX. Familiar with prevalent LLM architectures and inference optimization techniques including continuous batching and quantization. Knowledgeable about GPU architectures with experience in GPU kernel programming using CUDA.

About the job

Join our innovative team at Perplexity as an AI Inference Engineer, where you'll be at the forefront of deploying machine learning models for real-time inference. Our technology stack includes Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes. This is a fantastic opportunity to contribute to large-scale ML applications.

Key Responsibilities

  • Develop robust APIs for AI inference catering to both internal and external clients.

  • Conduct benchmarking and resolve performance bottlenecks in our inference stack.

  • Enhance system reliability and observability, responding effectively to outages.

  • Investigate cutting-edge research and implement optimizations for LLM inference.

About Perplexity

Perplexity is a pioneering technology company dedicated to advancing artificial intelligence. Our team thrives on innovation, collaboration, and the pursuit of cutting-edge solutions in AI. We empower our engineers to take ownership of their work and drive impactful results.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.