Senior Software Engineer - Platform at Trunkio | San Francisco
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
About Trunkio
At Trunkio, we are on a mission to revolutionize software delivery. Our innovative platform is designed to eliminate bottlenecks in code deployment, ensuring that engineering teams can focus on creating exceptional products. With a strong foundation built by experts from leading tech companies, we aim to foster a work environment that promotes productivity and job satisfaction.
Similar jobs
Search for Software Engineer Inference Platform At Fluidstack San Francisco
11,495 results
Join the Fluidstack TeamAt Fluidstack, we’re pioneering the infrastructure for advanced intelligence. We collaborate with leading AI laboratories, governmental entities, and major corporations—including Mistral, Poolside, and Meta—to deliver computing solutions at unprecedented speeds.Our mission is to transform the vision of Artificial General Intelligence (AGI) into a reality. Driven by our purpose, our dedicated team is committed to building state-of-the-art infrastructure that prioritizes our customers' success. If you share our passion for excellence and are eager to contribute to the future of intelligence, we invite you to be part of our journey.Role OverviewThe Inference Platform team at Fluidstack is at the forefront of addressing the cost and latency challenges associated with frontier AI. You will play a crucial role in managing the serving layer that connects our global accelerator supply with the production workloads of our clients, which include LLM serving frameworks, KV cache infrastructure, and Kubernetes orchestration across multiple data centers.This hands-on individual contributor role combines elements of distributed systems, model optimization, and serving infrastructure. You will oversee the entire lifecycle of inference deployments for leading AI labs, striving for enhancements in throughput, cost-efficiency, and response times, while also influencing the architectural decisions that guide Fluidstack’s deployment strategies.
About FluidstackAt Fluidstack, we are pioneering the infrastructure for advanced intelligence. Collaborating with leading AI laboratories, governmental bodies, and prominent enterprises such as Mistral, Poolside, Black Forest Labs, and Meta, we are committed to delivering computing capabilities at unparalleled speeds.We are driven by a sense of urgency to realize Artificial General Intelligence (AGI). Our team is highly dedicated to providing world-class infrastructure, prioritizing our customers' success as our own. We take immense pride in the systems we create and the trust we cultivate. If your motivation stems from purpose, a relentless pursuit of excellence, and a readiness to work diligently to accelerate the future of intelligence, we invite you to join us in shaping what lies ahead.About the RoleFluidstack is looking for a Director of Infrastructure who will be responsible for the hardware that supports some of the largest AI clusters globally. You will lead a multidisciplinary team of Networking Engineers, Compute Systems Engineers, Storage Engineers, and the ICT team, working closely with Procurement, Data Center Operations, Software Engineering, Site Reliability Engineering, Finance, Security, and Sales to ensure Fluidstack delivers and operates clusters more swiftly and reliably than any competitor.You have successfully deployed a GPU cluster with over 10,000 units using cutting-edge hardware. You possess the expertise to expedite deployment from months to weeks, having established the necessary tools, runbooks, and a culture that supports repeated success.
About FluidstackAt Fluidstack, we are revolutionizing the infrastructure for advanced intelligence. Collaborating with leading AI laboratories, government entities, and major corporations such as Mistral, Poolside, Black Forest Labs, and Meta, we aim to deliver computational power at unprecedented speeds.We are diligently striving to realize Artificial General Intelligence (AGI). Our team is driven by a shared mission, emphasizing high-quality infrastructure that enhances our clients' success. We pride ourselves on our commitment to excellence and the trust we build with our customers. If you are passionate about making a meaningful impact and are dedicated to advancing the frontier of intelligence, we invite you to join us in shaping the future.Role OverviewWe are seeking a Product Manager to lead our AI platform roadmap, encompassing managed inference and agent platforms. You will be responsible for defining how Fluidstack empowers customers to deploy, scale, and optimize large language model (LLM) inference workloads—covering aspects from model serving and routing to agent orchestration and complex AI systems. This role involves balancing customer demands for low latency and high throughput with the practical considerations of GPU utilization, cost-effectiveness, and platform reliability. You will collaborate with engineering, machine learning research, and go-to-market teams to strategically position Fluidstack against inference-driven competitors such as Together AI, Fireworks, Baseten, Modal, and Replicate.Key ResponsibilitiesLead the product strategy and roadmap for managed inference services, focusing on model deployment, autoscaling, multi-LoRA serving, and inference optimization.Define requirements for agent platform functionalities, including structured outputs, function calling, memory primitives, tool integration, and multi-step reasoning workflows.Make informed decisions regarding prioritization of inference optimizations such as speculative decoding, continuous batching, KV cache management, quantization support, and custom kernel integration.Collaborate with ML infrastructure engineers to create APIs, SDKs, and deployment workflows that facilitate model fine-tuning, version management, and A/B testing.Partner with datacenter teams to enhance GPU allocation strategies—balancing dedicated versus serverless deployments, cold start latency, and cost-per-token economics.Conduct competitive analysis of offerings from Together AI (inference optimization stack), Fireworks (custom inference engine), Baseten (training-to-inference integration), and Modal (serverless architecture).Establish pricing models that reflect customer usage patterns (tokens, requests, GPU-hours) while ensuring platform sustainability.
About FluidstackFluidstack is at the forefront of building the infrastructure for advanced intelligence. We collaborate with leading AI labs, governmental bodies, and major enterprises like Mistral, Poolside, Black Forest Labs, and Meta to deliver computational power at unprecedented speeds.Our mission is to accelerate the realization of Artificial General Intelligence (AGI). We are a dedicated team that prioritizes excellence and is passionate about creating world-class infrastructure. We view our customers' success as our own and take immense pride in the systems we develop and the trust we cultivate. If you are driven by a sense of purpose, have a relentless pursuit of excellence, and are eager to contribute to the future of intelligence, we invite you to join us in shaping what comes next.About the People TeamThe People team at Fluidstack is dedicated to fostering an environment where individuals can achieve their best work. We create and maintain the systems, environments, and partnerships that empower talented individuals to tackle significant challenges. Our work includes managing the infrastructure that supports the organization, enhancing employee experiences, and equipping managers and leaders to excel in their roles.Why This Role ExistsAs our office presence expands, it is crucial that daily operations are seamless and consistent. This role is essential to ensure that each Fluidstack office operates efficiently and meets high standards for employee satisfaction.About the RoleYour influence over the workplace experience at Fluidstack is significant. You will be responsible for managing the daily operations of your office, including the aesthetic and functional aspects of the space, as well as cultivating an inviting atmosphere for employees and visitors alike. You will report directly to the Head of Workplace and work closely with the People Ops team to address any issues that arise, whether it's fixing a broken monitor or enhancing lunch options, with equal diligence.This position demands a proactive approach and a strong sense of ownership. You will not wait for direction; instead, you will identify needs, take initiative, and implement solutions.What You Will DoFirst 30 DaysComplete onboarding to familiarize yourself with Fluidstack’s workplace standards, vendor landscape, and specific office dynamics.Shadow the current office setup or audit operations if opening a new location.
About FluidstackAt Fluidstack, we are pioneering the infrastructure that fuels abundant intelligence. We collaborate with leading AI laboratories, government entities, and major corporations—including Mistral, Poolside, Black Forest Labs, and Meta—to facilitate computing capabilities at unparalleled speeds.Our mission is to urgently transform the concept of Artificial General Intelligence (AGI) into reality. Our team is driven, passionate, and dedicated to constructing world-class infrastructure. We view our clients' success as our own, taking pride in the systems we create and the trust we establish. If you are inspired by purpose, dedicated to excellence, and eager to work diligently to propel the future of intelligence forward, we invite you to join us in shaping what lies ahead.About the RoleAs the Systems Controls Lead, you will take charge of designing, implementing, and continuously refining Fluidstack's General IT Controls (GITC) framework. You will work at the confluence of infrastructure, compliance, and security—ensuring that the systems that drive the future of AI are supported by a robust, auditable control environment. This role is critical and high-impact within a streamlined team, collaborating closely with Engineering, Security, Legal, and Finance to expand our controls program in line with business growth.
About FluidstackFluidstack is revolutionizing the foundation of artificial intelligence by creating infrastructure that powers abundant intelligence. Collaborating with leading AI laboratories, government entities, and industry giants like Mistral, Poolside, and Meta, we are committed to facilitating compute capabilities at unprecedented speeds.Our mission is to accelerate the realization of Artificial General Intelligence (AGI). We are driven by a strong sense of urgency and a commitment to providing world-class infrastructure. Our focus is on ensuring our customers achieve their desired outcomes, and we take pride in the systems we create and the trust we cultivate. If you are passionate about purpose, dedicated to excellence, and eager to contribute to shaping the future of intelligence, we invite you to join our team.About the RoleAs Fluidstack continues to expand rapidly to cater to the needs of the foremost AI organizations globally, we are seeking a Senior Manager for Business Process Controls to ensure operational excellence aligns with our ambitious technical goals. You will be responsible for designing, implementing, and continuously enhancing the vital processes that drive our business, including customer onboarding, vendor operations, internal workflows, and cross-functional collaboration. Collaborating closely with Engineering, Finance, Sales, and Operations teams, you will work to eliminate inefficiencies, enhance operational efficiency, and establish scalable systems that facilitate Fluidstack's growth without disruptions. This role is highly impactful and visible, ideal for an individual who excels at the intersection of strategic planning and execution.
About FluidstackAt Fluidstack, we are pioneering the infrastructure for advanced intelligence. Collaborating with leading AI laboratories, governmental bodies, and enterprises such as Mistral, Poolside, Black Forest Labs, and Meta, we aim to unlock computational power at unprecedented speeds.Our mission is urgent: to turn Artificial General Intelligence (AGI) into a tangible reality. Our team is driven, dedicated to delivering top-tier infrastructure, and we treat the outcomes of our customers as if they were our own, taking immense pride in the systems we develop and the trust we establish. If you are purpose-driven, passionate about excellence, and ready to work diligently to propel the future of intelligence, we invite you to join us in shaping what comes next.About the RoleAs a Senior / Staff Site Reliability Engineer (SRE) at Fluidstack, you will be central to our infrastructure, working across software, hardware, and operations to ensure the reliability and performance of our global GPU cloud.You will collaborate closely with teams in networking, platform engineering, and data center operations to construct systems that can scale to meet the increasing demands of AI workloads.SREs at Fluidstack are hands-on experts with profound systems knowledge and excellent communication skills. Your responsibilities will include addressing complex production challenges, deploying robust infrastructure, and continuously enhancing the stability and observability of our platform as we expand.A typical day might involve:Deploying clusters of over 1,000 GPUs using custom playbooks and adjusting these tools to deliver optimal solutions for our clients.Validating the correctness and performance of our compute, storage, and networking infrastructure, while collaborating with providers to enhance these subsystems.Migrating petabytes of data from public cloud platforms to local storage, efficiently and cost-effectively.Troubleshooting issues across the stack, ranging from hardware problems like obstructed server fans to optimizing S3 data loaders across different regions.Creating internal tools to reduce deployment times and enhance cluster reliability, including automation where customer benefits clearly surpass implementation costs.This role will require participation in an on-call rotation of up to one week per month.
Baseten develops infrastructure and tools that help AI companies deploy and scale inference. Teams at organizations like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer rely on Baseten to bring advanced machine learning models into production. The company recently secured a $300M Series E from investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Role overview This Software Engineer - GPU Inference position joins the founding team for Baseten Voice AI in San Francisco. The team focuses on building production-ready Voice AI systems, bringing open-source voice models into real-world use for clients in productivity, customer service, healthcare conversations, and education. The work shapes how people interact with technology through voice, creating broad impact across industries. In this role, the engineer leads the internal inference stack that powers Voice AI models. Responsibilities include guiding the product roadmap and driving engineering execution. Collaboration is a key part of the job, working closely with Forward Deployed Engineers, Model Performance Engineers, and other technical groups to advance Voice AI capabilities. Sample projects and initiatives The world's fastest Whisper, with streaming and diarization Canopy Labs selects Baseten for Orpheus TTS inference Partnering with the Core Product team to build an orchestration framework for a multi-model voice agent Working with the Training Platform team to support continuous training of voice models Designing a developer-friendly API and SDK for self-service adoption of Baseten Voice AI products
About This Role Join Databricks as a Software Engineer focused on GenAI inference, where you will play a pivotal role in designing, developing, and enhancing the inference engine that drives our Foundation Model API. Collaborating at the intersection of research and production, you will ensure our large language model (LLM) serving systems are optimized for speed, scalability, and efficiency. Your contributions will span the entire GenAI inference stack, from kernels and runtimes to orchestration and memory management. What You Will Do Participate in the design and implementation of the inference engine, collaborating on a model-serving stack tailored for large-scale LLM inference. Work closely with researchers to integrate new model architectures or features such as sparsity, activation compression, and mixture-of-experts into the engine. Optimize latency, throughput, memory efficiency, and hardware utilization across GPUs and other accelerators. Build and maintain tools for instrumentation, profiling, and tracing to identify bottlenecks and inform optimization efforts. Develop scalable routing, batching, scheduling, memory management, and dynamic loading mechanisms for inference workloads. Ensure reliability, reproducibility, and fault tolerance in inference pipelines, including A/B launches, rollback, and model versioning. Integrate with federated and distributed inference infrastructure, orchestrating across nodes, balancing load, and managing communication overhead. Engage in cross-functional collaboration with platform engineers, cloud infrastructure, and security/compliance teams. Document and share insights, contributing to internal best practices and open-source initiatives as appropriate.
Join Condor Software as a Full-Stack Platform EngineerAt Condor, we are revolutionizing the financial infrastructure that supports clinical development. With billions invested annually in discovering and developing new therapies, we strive to connect clinical operations and finance into a cohesive system. By integrating real-time financial intelligence, we empower R&D and finance leaders with the tools they need to make informed, high-stakes decisions.We are an AI-driven, pharma-native infrastructure provider, scaling industry standards in collaboration with top-tier partners. Our platform facilitates prediction, control, and execution in the most complex R&D environments worldwide.The Importance of Your RoleHaving established ourselves as a trusted partner for enterprise teams, we are now focused on the challenging task of scaling our platform to meet increasing demands. As a rapidly growing company, backed by prominent investors like Felicis and 645 Ventures, this is a unique opportunity to contribute to the foundational infrastructure that will redefine how therapies reach patients.Your ResponsibilitiesAs a Full-Stack Platform Engineer, you will be pivotal in building and scaling the core platform that supports the financial intelligence infrastructure relied upon by leading biopharma companies. This role encompasses critical engineering tasks at the intersection of backend systems, cloud infrastructure, and intelligent automation, with a strong emphasis on reliability and scalability.Your primary focus will be on backend architecture, where you'll design and implement services that drive complex financial and operational workflows. You'll be instrumental in shaping data flow, workflow orchestration, and enabling emerging AI-driven capabilities. This role goes beyond simple integration; you'll be crafting robust primitives that support other teams as our product and customer base expand.Working as a core member of a cross-functional product team, you will closely collaborate with product managers, designers, quality engineers, and data specialists to transition features from concept to production. While backend expertise is crucial, you will also engage across the stack to ensure the platform's capabilities are effectively leveraged.
Join Cartesia as an Inference EngineerAt Cartesia, our vision is to create the next evolution of AI: an interactive, omnipresent intelligence that operates seamlessly across all environments. Currently, even the most advanced models struggle to continuously analyze a year's worth of audio, video, and text data—comprising 1 billion text tokens, 10 billion audio tokens, and 1 trillion video tokens—much less perform these tasks on-device.We are at the forefront of developing the model architectures that will make this a reality. Our founding team, who met as PhD candidates at the Stanford AI Lab, pioneered State Space Models (SSMs), a groundbreaking framework for training efficient, large-scale foundation models. Our talented team merges deep expertise in model innovation and systems engineering with a design-focused product engineering approach, enabling us to build and launch state-of-the-art models and user experiences.Supported by leading investors such as Index Ventures and Lightspeed Venture Partners, along with contributions from Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks, and others, we are fortunate to be guided by numerous exceptional advisors and over 90 angel investors from diverse industries, including some of the world’s foremost experts in AI.About the RoleWe are actively seeking an Inference Engineer to propel our mission of creating real-time multimodal intelligence.Your ImpactDevelop and implement a low-latency, scalable, and dependable model inference and serving stack for our innovative foundation models utilizing Transformers, SSMs, and hybrid models.Collaborate closely with our research team and product engineers to efficiently deliver our product suite in a fast, cost-effective, and reliable manner.Construct robust inference infrastructure and monitoring systems for our product offerings.Enjoy substantial autonomy in shaping our products and directly influencing how cutting-edge AI is integrated across diverse devices and applications.What You BringAt Cartesia, we prioritize strong engineering skills due to the complexity and scale of the challenges we tackle.Proficient engineering skills with a comfort level in navigating intricate codebases, and a commitment to producing clean, maintainable code.Experience in developing large-scale distributed systems with strict performance, reliability, and observability requirements.Proven technical leadership, capable of executing and delivering results from zero to one amidst uncertainty.A background in or experience with inference pipelines, machine learning, and generative models.
OverviewAt Pulse, we are revolutionizing the way data infrastructure operates by addressing the critical challenge of accurately extracting structured information from intricate documents on a large scale. Our innovative document understanding technique merges intelligent schema mapping with advanced extraction models, outperforming traditional OCR and parsing methods.Located in the heart of San Francisco, we are a dynamic team of engineers dedicated to empowering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. Backed by top-tier investors, we are rapidly expanding our footprint in the industry.What sets our technology apart is our sophisticated multi-stage architecture, which includes:Specialized models for layout understanding and component detectionLow-latency OCR models designed for precise extractionAdvanced algorithms for reading-order in complex document structuresProprietary methods for table structure recognition and parsingFine-tuned vision-language models for interpreting charts, tables, and figuresIf you possess a strong passion for the convergence of computer vision, natural language processing, and data infrastructure, your contributions at Pulse will significantly impact our clients and help shape the future of document intelligence.
About FluidstackAt Fluidstack, we are pioneering the infrastructure for advanced intelligence. Collaborating with leading AI labs, government entities, and major corporations such as Mistral, Poolside, Black Forest Labs, and Meta, we aim to revolutionize computing capabilities at unprecedented speeds.Our mission is driven by a sense of urgency to transform AGI from a concept into reality. Our team is passionate, dedicated, and committed to developing world-class infrastructure. We prioritize our customers' success as if it were our own, taking pride in the systems we create and the trust we build. If you are passionate about meaningful work, strive for excellence, and are eager to contribute to the future of intelligence, we invite you to join us in creating what comes next.About the RoleAs a Senior General Ledger Accountant at Fluidstack, you will play a crucial role in our expanding Finance team, ensuring the accuracy and integrity of our financial records as we experience significant growth. You will oversee the complete general ledger process, assist with month-end and year-end closing activities, and collaborate across various teams to guarantee precise and timely financial reporting. This high-impact position is ideal for individuals who thrive in a dynamic environment and are eager to establish top-tier accounting systems at one of the most innovative companies in the AI sector.
Join our dynamic team at Perplexity as an AI Inference Engineer, where you will be at the forefront of deploying cutting-edge machine learning models for real-time inference. Our tech stack includes Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes, providing you with a chance to work on large-scale applications that make a real impact.Key ResponsibilitiesDesign and develop APIs for AI inference that cater to both internal and external stakeholders.Conduct benchmarking and identify bottlenecks within our inference stack to enhance performance.Ensure the reliability and observability of our systems while promptly addressing any outages.Investigate innovative research and implement optimizations for LLM inference.
Join the Fluidstack TeamAt Fluidstack, we are revolutionizing the infrastructure that empowers artificial intelligence. Our collaborations with leading AI labs, government entities, and major enterprises like Mistral, Poolside, Black Forest Labs, and Meta enable us to deliver computational power at unprecedented speeds.Our mission is to turn artificial general intelligence (AGI) into reality. We pride ourselves on our commitment to excellence and our focus on creating a world-class infrastructure. Our dedication to our customers’ success drives us to build exceptional systems that foster trust and efficiency. If you are passionate about making a difference, striving for excellence, and ready to work diligently to shape the future of intelligence, we invite you to join our innovative team.Your RoleWe are seeking a Software Engineer to play a pivotal role in developing the technological backbone of Fluidstack's talent acquisition and human resource operations. This position focuses on the complete talent lifecycle, where you'll be responsible for designing, developing, and implementing internal tools and integrations that enhance the efficiency of our recruiters, empower hiring managers with data-driven insights, and accelerate the onboarding process for new hires. You'll work closely with ATS and HRIS platforms, as well as the broader HR tech ecosystem, and leverage AI tools to create tailored solutions when existing tools fall short. Given the rapid pace and scale of our hiring, this role is crucial for fostering our growth.Key ResponsibilitiesLead the talent tooling roadmap: Collaborate with Recruitment and People Operations to define scope, design, implement, and deploy solutions.Develop and sustain integrations: Connect with tools like Ashby and Rippling, streamline data access, and minimize manual tasks.Create internal recruiting workflows: Design candidate pipeline dashboards, interview scheduling automation, offer letter generation, headcount planning views, and onboarding checklists.Enhance talent funnel analytics: Build dashboards to provide real-time insights into key metrics such as time-to-fill, offer acceptance rates, funnel conversion, and retention indicators.Automate HR processes: Simplify onboarding, e-signature routing, and data synchronization across ATS, HRIS, and IT systems.Implement AI-driven recruiting features: Develop intelligent candidate matching, automated scorecard summaries, job description drafting tools, and sourcing enhancement pipelines.
About Our TeamJoin the Inference team at OpenAI, where we leverage cutting-edge research and technology to deliver exceptional AI products to consumers, enterprises, and developers. Our mission is to empower users to harness the full potential of our advanced AI models, enabling unprecedented capabilities. We prioritize efficient and high-performance model inference while accelerating research advancements.About the RoleWe are seeking a passionate Software Engineer to optimize some of the world's largest and most sophisticated AI models for deployment in high-volume, low-latency, and highly available production and research environments.Key ResponsibilitiesCollaborate with machine learning researchers, engineers, and product managers to transition our latest technologies into production.Work closely with researchers to enable advanced research initiatives through innovative engineering solutions.Implement new techniques, tools, and architectures that enhance the performance, latency, throughput, and effectiveness of our model inference stack.Develop tools to identify bottlenecks and instability sources, designing and implementing solutions for priority issues.Optimize our code and Azure VM fleet to maximize every FLOP and GB of GPU RAM available.You Will Excel in This Role If You:Possess a solid understanding of modern machine learning architectures and an intuitive grasp of performance optimization strategies, especially for inference.Take ownership of problems end-to-end, demonstrating a willingness to acquire any necessary knowledge to achieve results.Bring at least 5 years of professional software engineering experience.Have or can quickly develop expertise in PyTorch, NVidia GPUs, and relevant optimization software stacks (such as NCCL, CUDA), along with HPC technologies like InfiniBand, MPI, and NVLink.Have experience in architecting, building, monitoring, and debugging production distributed systems, with bonus points for working on performance-critical systems.Have successfully rebuilt or significantly refactored production systems multiple times to accommodate rapid scaling.Are self-driven, enjoying the challenge of identifying and addressing the most critical problems.
Join Trunkio, where our mission is to enable teams to develop high-quality software swiftly. We have collaborated with engineering teams at top-tier companies like Google X, Zillow, and Brex to identify build failures, manage flaky tests, and enhance code deployment speed without compromising reliability. Although AI has accelerated code writing, the delivery process remains a challenge due to merge conflicts, inconsistent code quality, and other productivity-draining issues. Our goal is to help engineering teams focus on the design, implementation, and delivery of exceptional software, resulting in more fulfilling work experiences. We are currently developing a CI Reliability Platform that empowers teams to deliver code efficiently.Founded in 2021 by industry veterans from Uber, Google, YouTube, and Microsoft, Trunkio has successfully raised a $25M Series A led by Initialized Capital and a16z, with backing from notable investors including Haystack Ventures and the creators of GitHub, Apollo GraphQL, and Algolia.We are seeking a passionate and skilled Senior Software Engineer to join our Platform/Data Engineering team. In this pivotal role, you will design and optimize data ingestion pipelines to manage large volumes of real-time and batch data from diverse sources. Your expertise will be vital in creating systems that are scalable, reliable, and performant, while also ensuring seamless data integration across our ecosystem.
Join Perplexity as a skilled Software Engineer, where you will play a pivotal role in developing the next-generation AI Foundation and Platform. Our mission is to transform how individuals search and engage online. In this exciting position, you will contribute to building Perplexity's comprehensive AI data, evaluation, and personalization infrastructure, which underpins nearly all of our agent products.Technology Stack: Spark | AWS Data Stack (S3, RDS, DynamoDB, Docker, EKS, Kinesis) | Pytorch | Databricks | Snowflake | LLM APIsAs we continue to expand our user base and diverse use cases, our data stack ensures that millions around the globe receive fast, personalized answers.
About the RoleWe are seeking a talented Inference Engineering Manager to spearhead our AI Inference team at Perplexity. This is a remarkable opportunity to design and expand the infrastructure that drives Perplexity's innovative products and APIs, catering to millions of users with cutting-edge AI capabilities.You will take charge of the technical direction and implementation of our inference systems while cultivating and leading a high-caliber team of inference engineers. Our technology stack encompasses Python, PyTorch, Rust, C++, and Kubernetes. You will play a crucial role in architecting and scaling the large-scale deployment of machine learning models for Perplexity's Comet, Sonar, Search, and Deep Research products.Why Perplexity?Develop state-of-the-art systems that are among the fastest in the industry using leading-edge technology.Engage in high-impact work within a smaller team, enjoying considerable ownership and autonomy.Seize the chance to create infrastructure from the ground up instead of maintaining outdated systems.Work across the entire spectrum: minimizing costs, scaling traffic, and advancing the capabilities of inference.Make a significant impact on the technical roadmap and team culture at a rapidly expanding company.ResponsibilitiesLead and nurture a high-performing team of AI inference engineers.Develop APIs for AI inference utilized by both internal and external clients.Design and scale our inference infrastructure for enhanced reliability and efficiency.Benchmark and resolve bottlenecks across our inference stack.Drive large sparse/MoE model inference at rack scale, including sharding strategies for extensive models.Innovate by developing inference systems that support sparse attention and disaggregated pre-fill/decoding serving.Enhance the reliability and observability of our systems and lead incident response efforts.Make technical decisions regarding batching, throughput, latency, and GPU utilization.Collaborate with ML research teams on model optimization and deployment.Recruit, mentor, and develop engineering talent.Establish team processes, engineering standards, and operational excellence.Qualifications5+ years of engineering experience, with at least 2 years in a technical leadership or management capacity.Proficiency in programming languages and tools such as Python, PyTorch, Rust, and C++.Experience with Kubernetes and cloud infrastructure.Strong understanding of machine learning model deployment and optimization.Exceptional problem-solving and communication skills.
About Our TeamThe Inference team at OpenAI is dedicated to translating our cutting-edge research into accessible, transformative technology for consumers, enterprises, and developers. By leveraging our advanced AI models, we enable users to achieve unprecedented levels of innovation and productivity. Our primary focus lies in enhancing model inference efficiency and accelerating progress in research through optimized inference capabilities.About the RoleWe are seeking talented engineers to expand and optimize OpenAI's inference infrastructure, specifically targeting emerging GPU platforms. This role encompasses a wide range of responsibilities from low-level kernel optimization to high-level distributed execution. You will collaborate closely with our research, infrastructure, and performance teams to ensure seamless operation of our largest models on cutting-edge hardware.This position offers a unique opportunity to influence and advance OpenAI’s multi-platform inference capabilities, with a strong emphasis on optimizing performance for AMD accelerators.Your Responsibilities Include:Overseeing the deployment, accuracy, and performance of the OpenAI inference stack on AMD hardware.Integrating our internal model-serving infrastructure (e.g., vLLM, Triton) into diverse GPU-backed systems.Debugging and optimizing distributed inference workloads across memory, network, and compute layers.Validating the correctness, performance, and scalability of model execution on extensive GPU clusters.Collaborating with partner teams to design and optimize high-performance GPU kernels for accelerators utilizing HIP, Triton, or other performance-centric frameworks.Working with partner teams to develop, integrate, and fine-tune collective communication libraries (e.g., RCCL) to parallelize model execution across multiple GPUs.Ideal Candidates Will:Possess experience in writing or porting GPU kernels using HIP, CUDA, or Triton, with a strong focus on low-level performance.Be familiar with communication libraries like NCCL/RCCL, understanding their importance in high-throughput model serving.Have experience with distributed inference systems and be adept at scaling models across multiple accelerators.Enjoy tackling end-to-end performance challenges across hardware, system libraries, and orchestration layers.Be eager to join a dynamic, agile team focused on building innovative infrastructure from the ground up.
Sign in to browse more jobs
Create account — see all 11,495 results

