Software Engineer for Innovative Product Development

BasetenSan Francisco

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

Proficiency in programming languages such as Python, Java, or JavaScript. Strong understanding of software development principles and methodologies. Experience with version control systems like Git. Ability to work collaboratively in a team environment. Excellent problem-solving skills and attention to detail.

About the job

Join Baseten as a Software Engineer focused on developing cutting-edge products that push the boundaries of technology. In this role, you will collaborate with a dynamic team to design, implement, and maintain innovative software solutions that meet the needs of our users. You will have the opportunity to work on exciting projects that utilize the latest technologies and methodologies.

About Baseten

Baseten is a forward-thinking technology company based in San Francisco, dedicated to creating innovative software solutions that transform industries. Our mission is to empower businesses with tools that enhance productivity and drive success. We value creativity, collaboration, and a commitment to excellence in everything we do.

Similar jobs

1 - 20 of 11,133 Jobs

Search for Software Engineer Gpu Inference At Baseten San Francisco

11,133 results

Select all on this page (20)

Apply

Software Engineer - GPU Inference at Baseten | San Francisco

Baseten

Full-time|On-site|San Francisco

Baseten develops infrastructure and tools that help AI companies deploy and scale inference. Teams at organizations like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer rely on Baseten to bring advanced machine learning models into production. The company recently secured a $300M Series E from investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Role overview This Software Engineer - GPU Inference position joins the founding team for Baseten Voice AI in San Francisco. The team focuses on building production-ready Voice AI systems, bringing open-source voice models into real-world use for clients in productivity, customer service, healthcare conversations, and education. The work shapes how people interact with technology through voice, creating broad impact across industries. In this role, the engineer leads the internal inference stack that powers Voice AI models. Responsibilities include guiding the product roadmap and driving engineering execution. Collaboration is a key part of the job, working closely with Forward Deployed Engineers, Model Performance Engineers, and other technical groups to advance Voice AI capabilities. Sample projects and initiatives The world's fastest Whisper, with streaming and diarization Canopy Labs selects Baseten for Orpheus TTS inference Partnering with the Core Product team to build an orchestration framework for a multi-model voice agent Working with the Training Platform team to support continuous training of voice models Designing a developer-friendly API and SDK for self-service adoption of Baseten Voice AI products

Apr 26, 2026

Apply

Software Engineer - Realtime Systems at Baseten | San Francisco

Baseten

Full-time|On-site|San Francisco

Baseten supports companies like Cursor, Notion, and Writer in running AI inference at scale. The team blends AI research, adaptive infrastructure, and developer tools to help organizations deploy advanced AI models efficiently. Backed by investors such as BOND, IVP, and Greylock, Baseten recently raised a $300M Series E. The company aims to be the trusted platform for engineers launching AI products. Role overview The Software Engineer - Realtime Systems (Voice AI) role focuses on building and deploying production-ready Voice AI systems. Baseten’s Voice AI team works with open-source models to power applications in productivity, customer support, clinical conversations, creative tools, and education. Engineers in this group influence how people use voice to interact with technology, shaping products that impact multiple industries. This position involves leading Voice AI projects, setting both product direction and technical strategy. Collaboration is a key part of the work: expect to partner with Forward Deployed Engineers, Model Performance Engineers, and other teams to advance Baseten’s Voice AI capabilities. Sample projects The world's fastest Whisper, with streaming and diarization Orpheus TTS inference partnership with Canopy Labs Collaborate with the Core Product team to build a multi-model voice agent using Baseten’s orchestration framework Work alongside the Training Platform team to support ongoing training of voice models Design APIs and SDKs that make Baseten Voice AI products accessible for developers Location This role is based in San Francisco.

Apr 26, 2026

Apply

Software Engineer in GPU Networking & Distributed Systems

Baseten

Full-time|On-site|San Francisco

Join Baseten as a Software Engineer focusing on GPU Networking and Distributed Systems. In this pivotal role, you'll collaborate with talented engineers and researchers to develop cutting-edge solutions that leverage GPU technology for high-performance networking operations. Your contributions will be instrumental in shaping the future of distributed systems, enhancing performance, scalability, and reliability.

Feb 23, 2026

Apply

Data Engineer at baseten | San Francisco

baseten

Full-time|Remote|San Francisco

Join baseten as a Data Engineer and be at the forefront of data-driven innovation. In this role, you will design and implement robust data pipelines, ensuring the efficient processing and analysis of data to empower our products and decision-making processes. Collaborate with cross-functional teams to understand their data needs, while striving for optimization and scalability in data architectures.

Mar 18, 2026

Apply

Software Engineer, Inference – AMD GPU Enablement

OpenAI

Full-time|On-site|San Francisco

About Our TeamThe Inference team at OpenAI is dedicated to translating our cutting-edge research into accessible, transformative technology for consumers, enterprises, and developers. By leveraging our advanced AI models, we enable users to achieve unprecedented levels of innovation and productivity. Our primary focus lies in enhancing model inference efficiency and accelerating progress in research through optimized inference capabilities.About the RoleWe are seeking talented engineers to expand and optimize OpenAI's inference infrastructure, specifically targeting emerging GPU platforms. This role encompasses a wide range of responsibilities from low-level kernel optimization to high-level distributed execution. You will collaborate closely with our research, infrastructure, and performance teams to ensure seamless operation of our largest models on cutting-edge hardware.This position offers a unique opportunity to influence and advance OpenAI’s multi-platform inference capabilities, with a strong emphasis on optimizing performance for AMD accelerators.Your Responsibilities Include:Overseeing the deployment, accuracy, and performance of the OpenAI inference stack on AMD hardware.Integrating our internal model-serving infrastructure (e.g., vLLM, Triton) into diverse GPU-backed systems.Debugging and optimizing distributed inference workloads across memory, network, and compute layers.Validating the correctness, performance, and scalability of model execution on extensive GPU clusters.Collaborating with partner teams to design and optimize high-performance GPU kernels for accelerators utilizing HIP, Triton, or other performance-centric frameworks.Working with partner teams to develop, integrate, and fine-tune collective communication libraries (e.g., RCCL) to parallelize model execution across multiple GPUs.Ideal Candidates Will:Possess experience in writing or porting GPU kernels using HIP, CUDA, or Triton, with a strong focus on low-level performance.Be familiar with communication libraries like NCCL/RCCL, understanding their importance in high-throughput model serving.Have experience with distributed inference systems and be adept at scaling models across multiple accelerators.Enjoy tackling end-to-end performance challenges across hardware, system libraries, and orchestration layers.Be eager to join a dynamic, agile team focused on building innovative infrastructure from the ground up.

Oct 8, 2025

Apply

GPU Kernel Engineer

Baseten

Full-time|On-site|San Francisco

ABOUT BASETENAt Baseten, we empower the world's leading AI firms—such as Cursor, Notion, and OpenEvidence—by delivering mission-critical inference solutions. Our unique blend of applied AI research, robust infrastructure, and user-friendly developer tools enables AI pioneers to effectively deploy groundbreaking models. With our recent achievement of a $300M Series E funding round supported by esteemed investors like BOND and IVP, we're on an exciting growth trajectory. Join our dynamic team and contribute to the platform that drives the next generation of AI products.THE ROLEWe are looking for an experienced Senior GPU Kernel Engineer to join our innovative team at the forefront of AI acceleration. In this role, your programming expertise will directly enhance the performance of cutting-edge machine learning models. You'll be responsible for developing highly efficient GPU kernels that optimize computational processes, allowing for transformative AI applications.You'll thrive in a fast-paced, intellectually challenging environment where your technical skills are pivotal. Your contributions will directly affect production systems that serve millions of users across various platforms. This position offers exceptional opportunities for career advancement for engineers enthusiastic about low-level optimization and impactful systems engineering.EXAMPLE INITIATIVESAs part of our Model Performance team, you will engage in projects like:Baseten Embeddings Inference: The quickest embeddings solution availableThe Baseten Inference StackEnhancing model performance optimizationRESPONSIBILITIESCore Engineering ResponsibilitiesDesign and develop high-performance GPU kernels for essential machine learning operations, including matrix multiplications and attention mechanisms.Collaborate with cross-functional teams to drive performance improvements and implement optimizations.Debug and refine kernel code to achieve maximal efficiency and reliability.Stay abreast of the latest advancements in GPU technology and machine learning frameworks.

Jul 17, 2025

Apply

Onboarding Program Manager at baseten | San Francisco

Baseten

Full-time|On-site|San Francisco

Join Baseten as an Onboarding Program Manager where you will play a vital role in shaping the onboarding experience for our new team members. You will be responsible for developing and implementing effective onboarding programs that enhance employee engagement and retention.

Mar 4, 2026

Apply

Integrated Marketing Manager at Baseten | San Francisco

Baseten

Full-time|On-site|San Francisco

About Baseten Baseten supports leading AI companies, including Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer, by delivering essential inference capabilities. The platform brings together advanced AI research, flexible infrastructure, and developer-friendly tools, helping teams move models from the lab into production. Backed by a recent $300M Series E funding round and investors such as BOND, IVP, Spark Capital, Greylock, and Conviction, Baseten is growing quickly in its mission to become the platform engineers trust for building and shipping AI products. Role Overview The Integrated Marketing Manager will shape and run multi-channel marketing campaigns to drive a qualified pipeline and strengthen Baseten’s go-to-market approach. This role calls for a strategic marketer with hands-on experience in AI, comfortable guiding campaigns from initial idea through launch and measurement, and collaborating across teams and channels. What You Will Do Develop and execute full-funnel campaign programs that include content, paid media, email outreach, events, and web initiatives Increase awareness, engagement, and pipeline growth as Baseten scales through FY’27 Work closely with cross-functional teams to ensure campaigns align with business goals and market needs Analyze campaign performance and apply insights to improve future efforts Location This position is based in San Francisco.

Apr 17, 2026

Apply

Applied AI Inference Engineer

Baseten

Full-time|$300K/yr - $300K/yr|On-site|San Francisco

ABOUT BASETENAt Baseten, we empower the leading AI companies of today, including Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer, by providing essential inference capabilities. Our unique blend of applied AI research, adaptable infrastructure, and intuitive developer tools enables innovators at the cutting edge of AI to seamlessly transition advanced models into production. With our recent success in securing a $300M Series E funding round, backed by notable investors such as BOND, IVP, Spark Capital, Greylock, and Conviction, we're on an exciting growth trajectory. Join our team and contribute to the platform that engineers rely on to launch AI-driven products.THE ROLEAs an Applied AI Inference Engineer at Baseten, you'll collaborate closely with clients to design, develop, and implement high-performance AI applications using our platform. You will guide customers through the entire process, from initial concept to deployment, transforming vague business objectives into dependable, observable solutions that meet defined quality, latency, and cost metrics.This position is ideal for innovative engineers eager to gain insight into how modern organizations scale AI adoption. You will thrive if you enjoy a multifaceted role that intersects product development, software engineering, performance optimization, and direct customer engagement.It’s essential to note that this position requires hands-on coding and software development, while also encompassing elements of product management, technical customer success, and pre-sales engineering.EXAMPLE INITIATIVESExplore insights from our Forward Deployed Engineering team through these blog posts: Forward Deployed Engineering on the frontier of AIThe fastest, most accurate Whisper transcriptionDeploy production-ready model servers from Docker imagesDeploy custom ComfyUI workflows as APIs...

Nov 4, 2025

Apply

Software Engineer - Inference Platform at Fluidstack | San Francisco

Fluidstack

Full-time|$165K/yr - $500K/yr|On-site|San Francisco, CA

Join the Fluidstack TeamAt Fluidstack, we’re pioneering the infrastructure for advanced intelligence. We collaborate with leading AI laboratories, governmental entities, and major corporations—including Mistral, Poolside, and Meta—to deliver computing solutions at unprecedented speeds.Our mission is to transform the vision of Artificial General Intelligence (AGI) into a reality. Driven by our purpose, our dedicated team is committed to building state-of-the-art infrastructure that prioritizes our customers' success. If you share our passion for excellence and are eager to contribute to the future of intelligence, we invite you to be part of our journey.Role OverviewThe Inference Platform team at Fluidstack is at the forefront of addressing the cost and latency challenges associated with frontier AI. You will play a crucial role in managing the serving layer that connects our global accelerator supply with the production workloads of our clients, which include LLM serving frameworks, KV cache infrastructure, and Kubernetes orchestration across multiple data centers.This hands-on individual contributor role combines elements of distributed systems, model optimization, and serving infrastructure. You will oversee the entire lifecycle of inference deployments for leading AI labs, striving for enhancements in throughput, cost-efficiency, and response times, while also influencing the architectural decisions that guide Fluidstack’s deployment strategies.

Mar 5, 2026

Apply

Account Executive, Industries - Baseten

Baseten

Full-time|On-site|San Francisco

Join Baseten as an Account Executive in the Industries division, where you'll play a pivotal role in driving growth and building strong client relationships. In this position, you will leverage your expertise to engage with prospective customers, understand their needs, and offer tailored solutions that align with their objectives. Ideal candidates will possess exceptional communication skills, a strong sales acumen, and a passion for technology.

Mar 20, 2026

Apply

Software Engineer - GenAI Inference at Databricks | San Francisco

Databricks

Full-time|$142.2K/yr - $204.6K/yr|On-site|San Francisco, California

About This Role Join Databricks as a Software Engineer focused on GenAI inference, where you will play a pivotal role in designing, developing, and enhancing the inference engine that drives our Foundation Model API. Collaborating at the intersection of research and production, you will ensure our large language model (LLM) serving systems are optimized for speed, scalability, and efficiency. Your contributions will span the entire GenAI inference stack, from kernels and runtimes to orchestration and memory management. What You Will Do Participate in the design and implementation of the inference engine, collaborating on a model-serving stack tailored for large-scale LLM inference. Work closely with researchers to integrate new model architectures or features such as sparsity, activation compression, and mixture-of-experts into the engine. Optimize latency, throughput, memory efficiency, and hardware utilization across GPUs and other accelerators. Build and maintain tools for instrumentation, profiling, and tracing to identify bottlenecks and inform optimization efforts. Develop scalable routing, batching, scheduling, memory management, and dynamic loading mechanisms for inference workloads. Ensure reliability, reproducibility, and fault tolerance in inference pipelines, including A/B launches, rollback, and model versioning. Integrate with federated and distributed inference infrastructure, orchestrating across nodes, balancing load, and managing communication overhead. Engage in cross-functional collaboration with platform engineers, cloud infrastructure, and security/compliance teams. Document and share insights, contributing to internal best practices and open-source initiatives as appropriate.

Jan 30, 2026

Apply

Software Engineer - Internal Platform

Baseten

Full-time|On-site|San Francisco

ABOUT BASETENAt Baseten, we empower cutting-edge AI companies, including Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer, to achieve mission-critical inference. By merging advanced AI research with flexible infrastructure and intuitive developer tools, we facilitate the deployment of innovative AI models into production. Having recently secured a $300M Series E funding round from esteemed investors like BOND, IVP, Spark Capital, Greylock, and Conviction, we are on a rapid growth trajectory. Join our team and contribute to building a platform that engineers rely on to launch AI products successfully.THE ROLEAs a key member of Baseten's Platform Team, you will play a crucial role in developing internal infrastructure to support our engineering division. While our product offers infrastructure for AI advancements, your primary focus will be on crafting robust internal systems that enhance productivity, collaboration, and work quality across engineering teams, leveraging exceptional tools, efficient workflows, and resilient development settings.If you have a passion for elegant solutions—such as streamlined monorepos, rapid CI pipelines, and well-designed shared libraries—you will excel at Baseten.RESPONSIBILITIESDevelop a range of tools customized to meet the diverse needs of engineering teams.Enhance monorepo functionality and create project templates to ensure consistency and efficiency.Design and implement shared libraries focused on system observability.Optimize the speed, reliability, and thoroughness of our CI pipelines.Assist in designing and maintaining Terraform modules for effective infrastructure management.Provide innovative solutions to improve visibility within continuous delivery (CD) processes.Proactively support engineering teams, ensuring they have the necessary resources and tools for maximum productivity.REQUIREMENTSProficiency in Go and/or Python programming languages.Experience with Kubernetes and Docker tools (e.g., Helm, Docker, Kubernetes).Demonstrated experience managing and working with large monorepos.Strong problem-solving skills with an emphasis on efficient software delivery.Familiarity with CI/CD methodologies and tools.Excellent communication and collaboration skills.

Mar 26, 2025

Apply

Software Engineer - Voice AI

Baseten

Full-time|On-site|San Francisco

Baseten creates AI inference solutions for clients such as Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. The team blends AI research, infrastructure, and developer tools to help organizations deploy advanced models. Backed by $300M in Series E funding from BOND, IVP, Spark Capital, Greylock, and Conviction, Baseten is expanding quickly and shaping the landscape for engineers building AI products. Role overview The Software Engineer - Voice AI role centers on building and deploying open-source voice models for real-world use. Voice is becoming a key interface across the web, and this position addresses the technical challenges of bringing production-ready Voice AI to market. The work supports applications in productivity, customer service, clinical dialogue, creator tools, education, and more, helping to change how people interact with technology across sectors. This engineer leads Baseten’s Voice AI efforts, guiding the proprietary inference stack that powers Voice AI models. The role balances shaping the product roadmap with hands-on engineering. Collaboration is a core part of the job, working closely with Forward Deployed Engineers, Model Performance Engineers, and other technical teams to advance Voice AI capabilities. Sample projects and initiatives The world's fastest Whisper, with streaming and diarization Canopy Labs selects Baseten for Orpheus TTS inference Partnering with the Core Product team to build an orchestration framework for a multi-model voice agent Working with the Training Platform team to support ongoing training of voice models Designing a developer-friendly API and SDK to encourage self-service adoption of Baseten Voice AI products Location San Francisco

Apr 26, 2026

Apply

Inference Engineer at Cartesia | San Francisco, CA

Cartesia

Full-time|On-site|*HQ - San Francisco, CA

Join Cartesia as an Inference EngineerAt Cartesia, our vision is to create the next evolution of AI: an interactive, omnipresent intelligence that operates seamlessly across all environments. Currently, even the most advanced models struggle to continuously analyze a year's worth of audio, video, and text data—comprising 1 billion text tokens, 10 billion audio tokens, and 1 trillion video tokens—much less perform these tasks on-device.We are at the forefront of developing the model architectures that will make this a reality. Our founding team, who met as PhD candidates at the Stanford AI Lab, pioneered State Space Models (SSMs), a groundbreaking framework for training efficient, large-scale foundation models. Our talented team merges deep expertise in model innovation and systems engineering with a design-focused product engineering approach, enabling us to build and launch state-of-the-art models and user experiences.Supported by leading investors such as Index Ventures and Lightspeed Venture Partners, along with contributions from Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks, and others, we are fortunate to be guided by numerous exceptional advisors and over 90 angel investors from diverse industries, including some of the world’s foremost experts in AI.About the RoleWe are actively seeking an Inference Engineer to propel our mission of creating real-time multimodal intelligence.Your ImpactDevelop and implement a low-latency, scalable, and dependable model inference and serving stack for our innovative foundation models utilizing Transformers, SSMs, and hybrid models.Collaborate closely with our research team and product engineers to efficiently deliver our product suite in a fast, cost-effective, and reliable manner.Construct robust inference infrastructure and monitoring systems for our product offerings.Enjoy substantial autonomy in shaping our products and directly influencing how cutting-edge AI is integrated across diverse devices and applications.What You BringAt Cartesia, we prioritize strong engineering skills due to the complexity and scale of the challenges we tackle.Proficient engineering skills with a comfort level in navigating intricate codebases, and a commitment to producing clean, maintainable code.Experience in developing large-scale distributed systems with strict performance, reliability, and observability requirements.Proven technical leadership, capable of executing and delivering results from zero to one amidst uncertainty.A background in or experience with inference pipelines, machine learning, and generative models.

Dec 12, 2024

Apply

Software Engineer for Innovative Product Development

Baseten

Full-time|On-site|San Francisco

Feb 24, 2026

Apply

Software Engineer - Billing and Internal Tooling

Baseten

Full-time|On-site|San Francisco

Join Baseten as a Software Engineer focused on Billing and Internal Tooling, where you will play a crucial role in developing and enhancing our internal systems. You will collaborate with cross-functional teams to create efficient billing solutions and streamline internal processes. Your contributions will directly impact our operational efficiency and customer satisfaction.

Feb 27, 2026

Apply

Software Engineer, Inference

Pulse

Full-time|On-site|San Francisco

OverviewAt Pulse, we are revolutionizing the way data infrastructure operates by addressing the critical challenge of accurately extracting structured information from intricate documents on a large scale. Our innovative document understanding technique merges intelligent schema mapping with advanced extraction models, outperforming traditional OCR and parsing methods.Located in the heart of San Francisco, we are a dynamic team of engineers dedicated to empowering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. Backed by top-tier investors, we are rapidly expanding our footprint in the industry.What sets our technology apart is our sophisticated multi-stage architecture, which includes:Specialized models for layout understanding and component detectionLow-latency OCR models designed for precise extractionAdvanced algorithms for reading-order in complex document structuresProprietary methods for table structure recognition and parsingFine-tuned vision-language models for interpreting charts, tables, and figuresIf you possess a strong passion for the convergence of computer vision, natural language processing, and data infrastructure, your contributions at Pulse will significantly impact our clients and help shape the future of document intelligence.

Jul 30, 2025

Apply

Software Engineer - Core Product

Baseten

Full-time|$300K/yr - $300K/yr|On-site|San Francisco

ABOUT BASETENAt Baseten, we empower innovative AI companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer to execute mission-critical inference with ease. By merging advanced AI research with flexible infrastructure and robust developer tools, we enable organizations at the forefront of AI to seamlessly deploy cutting-edge models into production. Fueled by rapid growth and a recent $300M Series E investment from industry leaders such as BOND, IVP, Spark Capital, Greylock, and Conviction, we're building the essential platform that engineers trust to launch AI products.THE ROLEAs a Software Engineer on our Core Product team, you will play a pivotal role in developing and enhancing the core Baseten platform, empowering users to effortlessly deploy and derive value from machine learning models. Given our developer-centric approach, you will engage with a vast array of components, including CLI tools, REST APIs, and the web application. The Core Product team leads all new product innovations within Baseten.EXAMPLE INITIATIVESAs part of our Core Product team, you will tackle exciting projects such as:Chains for multi-component workflowsAsynchronous inferenceModel APIs for cutting-edge modelsModel training optimized for production inferenceRESPONSIBILITIESDevelop and implement new features and products for the teamDesign intuitive APIs and abstractions to effectively address customer needsQuickly resolve bugs and customer issues with a proactive approachWork across the technology stack; you'll engage with both React Components and Kubernetes PodsCollaborate closely with product managers and cross-functional teams to drive product success

Jul 9, 2024

Apply

Software Engineer - Model API's

Baseten

Full-time|On-site|San Francisco

ABOUT BASETENAt Baseten, we are at the forefront of AI innovation, providing critical inference solutions for leading AI companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our platform combines advanced AI research, adaptable infrastructure, and intuitive developer tools, empowering organizations to deploy state-of-the-art models effectively. With rapid growth and a recent $300M Series E funding round backed by top-tier investors including BOND, IVP, Spark Capital, Greylock, and Conviction, we invite you to join our mission in building the platform of choice for engineers delivering AI products.THE ROLE:As a member of Baseten’s Model Performance (MP) team, you will play a pivotal role in ensuring our platform’s model APIs are not only fast and reliable but also cost-effective. Your primary focus will be on developing and optimizing the infrastructure that supports our hosted API endpoints for cutting-edge open-source models. This role involves working with distributed systems, model serving, and enhancing the developer experience. You will collaborate with a small, dynamic team at the intersection of product development, model performance, and infrastructure, defining how developers interact with AI models on a large scale.RESPONSIBILITIES:Design, develop, and maintain the Model APIs surface, focusing on advanced inference features such as structured outputs (JSON mode, grammar-constrained generation), tool/function calling, and multi-modal serving.Profile and optimize TensorRT-LLM kernels, analyze CUDA kernel performance, create custom CUDA operators, and enhance memory allocation patterns for maximum efficiency across multi-GPU setups.Implement performance improvements across various runtimes based on a deep understanding of their internals, including speculative decoding, guided generation for structured outputs, and custom scheduling algorithms for high-performance serving.Develop robust benchmarking frameworks to evaluate real-world performance across diverse model architectures, batch sizes, sequence lengths, and hardware configurations.Enhance performance across runtimes (e.g., TensorRT, TensorRT-LLM) through techniques such as speculative decoding, quantization, batching, and KV-cache reuse.Integrate deep observability mechanisms (metrics, traces, logs) and establish repeatable benchmarks to assess speed, reliability, and quality.

Oct 11, 2025

Create account — see all 11,133 results