companyPerplexity logo

Engineering Manager - AI Inference at Perplexity | San Francisco

PerplexitySan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Manager

Qualifications

Qualifications5+ years of engineering experience, with a minimum of 2 years in a technical leadership or management role. Expertise in programming languages and frameworks such as Python, PyTorch, Rust, and C++. Experience with Kubernetes and cloud-based infrastructure. Strong knowledge of machine learning model deployment and optimization techniques. Excellent problem-solving abilities and effective communication skills.

About the job

About the Role

We are seeking a talented Inference Engineering Manager to spearhead our AI Inference team at Perplexity. This is a remarkable opportunity to design and expand the infrastructure that drives Perplexity's innovative products and APIs, catering to millions of users with cutting-edge AI capabilities.

You will take charge of the technical direction and implementation of our inference systems while cultivating and leading a high-caliber team of inference engineers. Our technology stack encompasses Python, PyTorch, Rust, C++, and Kubernetes. You will play a crucial role in architecting and scaling the large-scale deployment of machine learning models for Perplexity's Comet, Sonar, Search, and Deep Research products.

Why Perplexity?

  • Develop state-of-the-art systems that are among the fastest in the industry using leading-edge technology.

  • Engage in high-impact work within a smaller team, enjoying considerable ownership and autonomy.

  • Seize the chance to create infrastructure from the ground up instead of maintaining outdated systems.

  • Work across the entire spectrum: minimizing costs, scaling traffic, and advancing the capabilities of inference.

  • Make a significant impact on the technical roadmap and team culture at a rapidly expanding company.

Responsibilities

  • Lead and nurture a high-performing team of AI inference engineers.

  • Develop APIs for AI inference utilized by both internal and external clients.

  • Design and scale our inference infrastructure for enhanced reliability and efficiency.

  • Benchmark and resolve bottlenecks across our inference stack.

  • Drive large sparse/MoE model inference at rack scale, including sharding strategies for extensive models.

  • Innovate by developing inference systems that support sparse attention and disaggregated pre-fill/decoding serving.

  • Enhance the reliability and observability of our systems and lead incident response efforts.

  • Make technical decisions regarding batching, throughput, latency, and GPU utilization.

  • Collaborate with ML research teams on model optimization and deployment.

  • Recruit, mentor, and develop engineering talent.

  • Establish team processes, engineering standards, and operational excellence.

Qualifications

  • 5+ years of engineering experience, with at least 2 years in a technical leadership or management capacity.

  • Proficiency in programming languages and tools such as Python, PyTorch, Rust, and C++.

  • Experience with Kubernetes and cloud infrastructure.

  • Strong understanding of machine learning model deployment and optimization.

  • Exceptional problem-solving and communication skills.

About Perplexity

Perplexity is at the forefront of AI technology, committed to delivering state-of-the-art solutions that empower users. With a dynamic and innovative work environment, we prioritize growth, collaboration, and the development of cutting-edge products that drive our success.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.