About the job
About the Role
We are seeking a talented Inference Engineering Manager to spearhead our AI Inference team at Perplexity. This is a remarkable opportunity to design and expand the infrastructure that drives Perplexity's innovative products and APIs, catering to millions of users with cutting-edge AI capabilities.
You will take charge of the technical direction and implementation of our inference systems while cultivating and leading a high-caliber team of inference engineers. Our technology stack encompasses Python, PyTorch, Rust, C++, and Kubernetes. You will play a crucial role in architecting and scaling the large-scale deployment of machine learning models for Perplexity's Comet, Sonar, Search, and Deep Research products.
Why Perplexity?
Develop state-of-the-art systems that are among the fastest in the industry using leading-edge technology.
Engage in high-impact work within a smaller team, enjoying considerable ownership and autonomy.
Seize the chance to create infrastructure from the ground up instead of maintaining outdated systems.
Work across the entire spectrum: minimizing costs, scaling traffic, and advancing the capabilities of inference.
Make a significant impact on the technical roadmap and team culture at a rapidly expanding company.
Responsibilities
Lead and nurture a high-performing team of AI inference engineers.
Develop APIs for AI inference utilized by both internal and external clients.
Design and scale our inference infrastructure for enhanced reliability and efficiency.
Benchmark and resolve bottlenecks across our inference stack.
Drive large sparse/MoE model inference at rack scale, including sharding strategies for extensive models.
Innovate by developing inference systems that support sparse attention and disaggregated pre-fill/decoding serving.
Enhance the reliability and observability of our systems and lead incident response efforts.
Make technical decisions regarding batching, throughput, latency, and GPU utilization.
Collaborate with ML research teams on model optimization and deployment.
Recruit, mentor, and develop engineering talent.
Establish team processes, engineering standards, and operational excellence.
Qualifications
5+ years of engineering experience, with at least 2 years in a technical leadership or management capacity.
Proficiency in programming languages and tools such as Python, PyTorch, Rust, and C++.
Experience with Kubernetes and cloud infrastructure.
Strong understanding of machine learning model deployment and optimization.
Exceptional problem-solving and communication skills.

