About the job
Join our innovative team at Perplexity as an AI Inference Engineer, where you'll be at the forefront of deploying machine learning models for real-time inference. Our technology stack includes Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes. This is a fantastic opportunity to contribute to large-scale ML applications.
Key Responsibilities
Develop robust APIs for AI inference catering to both internal and external clients.
Conduct benchmarking and resolve performance bottlenecks in our inference stack.
Enhance system reliability and observability, responding effectively to outages.
Investigate cutting-edge research and implement optimizations for LLM inference.
