About the job
Join our dynamic team at Perplexity as an AI Inference Engineer, where you will be at the forefront of deploying cutting-edge machine learning models for real-time inference. Our tech stack includes Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes, providing you with a chance to work on large-scale applications that make a real impact.
Key Responsibilities
Design and develop APIs for AI inference that cater to both internal and external stakeholders.
Conduct benchmarking and identify bottlenecks within our inference stack to enhance performance.
Ensure the reliability and observability of our systems while promptly addressing any outages.
Investigate innovative research and implement optimizations for LLM inference.

