Software Engineer Specializing In Kernel Performance Ai Tooling jobs in San Francisco – Browse 7,643 openings on RoboApply Jobs

Software Engineer Specializing In Kernel Performance Ai Tooling jobs in San Francisco

Open roles matching “Software Engineer Specializing In Kernel Performance Ai Tooling” with location signals for San Francisco. 7,643 active listings on RoboApply Jobs.

7,643 jobs found

1 - 20 of 7,643 Jobs
Apply
companyOpenAI logo
Full-time|Remote|San Francisco

About the Role OpenAI is looking for a Software Engineer specializing in Kernel Performance and AI Tooling to join the team in San Francisco. This role centers on improving software systems for maximum efficiency and building advanced tools that support AI development. What You Will Do Optimize kernel-level performance across OpenAI's software stack. Design and implement tools that accelerate AI research and deployment. Work closely with engineers to identify bottlenecks and deliver practical solutions. Contribute to technical discussions and share knowledge with teammates. Team and Collaboration Work alongside engineers who are committed to advancing AI technology. Collaboration and innovation are central to the team’s approach.

Apr 17, 2026
Apply
companyZyphra logo
Full-time|On-site|San Francisco

Join Zyphra as a Research Engineer specializing in AI Performance and Kernel Optimization. In this role, you will work at the forefront of AI technologies, developing and optimizing kernel solutions that enhance the performance of our systems. You will collaborate with cross-functional teams, leveraging your expertise to drive innovation and efficiency.

Mar 16, 2026
Apply
companyDatabricks logo
Full-time|$190.9K/yr - $232.8K/yr|On-site|San Francisco, California

P-1285 About This Role Join our dynamic team at Databricks as a Staff Software Engineer specializing in GenAI Performance and Kernel. In this pivotal role, you will take charge of designing, implementing, and optimizing high-performance GPU kernels that drive our GenAI inference stack. Your expertise will lead the development of finely-tuned, low-level compute paths, balancing hardware efficiency with versatility, while mentoring fellow engineers in the intricacies of kernel-level performance engineering. Collaborating closely with machine learning researchers, systems engineers, and product teams, you will elevate the forefront of inference performance at scale. What You Will Do Lead the design, implementation, benchmarking, and maintenance of essential compute kernels (such as attention, MLP, softmax, layernorm, memory management) tailored for diverse hardware backends (GPU, accelerators). Steer the performance roadmap for kernel-level enhancements, focusing on areas like vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, and auto-tuning. Integrate kernel optimizations seamlessly with higher-level machine learning systems. Develop and uphold profiling, instrumentation, and verification tools to identify correctness, performance regressions, numerical discrepancies, and hardware utilization inefficiencies. Conduct performance investigations and root-cause analyses to address inference bottlenecks, such as memory bandwidth, cache contention, kernel launch overhead, and tensor fragmentation. Create coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross-backend compatibility, and maintainability. Influence architectural decisions to enhance kernel efficiency (including memory layout, dataflow scheduling, and kernel fusion boundaries). Guide and mentor fellow engineers focused on lower-level performance, conducting code reviews and establishing best practices. Collaborate with infrastructure, tooling, and machine learning teams to implement kernel-level optimizations in production and assess their impacts.

Jan 30, 2026
Apply
companyInferact logo
Full-time|$200K/yr - $400K/yr|Remote|San Francisco

At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, significantly enhancing the speed and reducing the cost of AI inference. Our founders, the visionaries behind vLLM, have spent years bridging the gap between advanced models and cutting-edge hardware.About the RoleWe are seeking a skilled performance engineer dedicated to maximizing the computational efficiency of modern accelerators. In this role, you'll develop kernels and implement low-level optimizations that position vLLM as the fastest inference engine globally. Your contributions will be pivotal as your code will execute across a broad spectrum of hardware accelerators, from NVIDIA GPUs to the latest silicon innovations. You'll collaborate closely with hardware vendors to ensure we fully leverage the capabilities of each new generation of chips.

Jan 22, 2026
Apply
companySciforium logo
Full-time|On-site|San Francisco

At Sciforium, we are at the forefront of AI infrastructure, innovating next-generation multimodal AI models and a proprietary high-efficiency serving platform. With substantial funding and direct collaboration from AMD, supported by their engineers, our team is rapidly expanding to develop the complete stack that powers cutting-edge AI models and real-time applications.About the RoleWe are on the lookout for a talented GPU Kernel Engineer who is eager to explore and maximize performance on modern accelerators. In this role, you will be responsible for designing and optimizing custom GPU kernels that drive our advanced large-scale AI systems. You will navigate the hardware-software stack, engaging in low-level kernel development and integrating optimized operations into high-level machine learning frameworks for large-scale training and inference.This position is perfect for someone who excels at the intersection of GPU programming, systems engineering, and state-of-the-art AI workloads, and aims to contribute significantly to the efficiency and scalability of our machine learning platform.Key ResponsibilitiesDevelop, implement, and enhance custom GPU kernels utilizing C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas.Profile and fine-tune the end-to-end performance of machine learning operations, particularly for large-scale LLM training and inference.Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and our proprietary internal runtimes.Create performance models, pinpoint bottlenecks, and deliver kernel-level enhancements that significantly boost AI workloads.Collaborate with machine learning researchers, distributed systems engineers, and model-serving teams to optimize computational performance across the entire stack.Engage closely with hardware vendors (NVIDIA/AMD) and stay updated on the latest GPU architecture and compiler/toolchain advancements.Contribute to the development of tools, documentation, benchmarking suites, and testing frameworks ensuring correctness and performance reproducibility.Must-Haves5+ years of industry or research experience in GPU kernel development or high-performance computing.Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related discipline.Strong programming proficiency in C++, Python, and familiarity with machine learning frameworks.

Dec 6, 2025
Apply
companyBaseten logo
Full-time|On-site|San Francisco

ABOUT BASETENAt Baseten, we empower the world's leading AI firms—such as Cursor, Notion, and OpenEvidence—by delivering mission-critical inference solutions. Our unique blend of applied AI research, robust infrastructure, and user-friendly developer tools enables AI pioneers to effectively deploy groundbreaking models. With our recent achievement of a $300M Series E funding round supported by esteemed investors like BOND and IVP, we're on an exciting growth trajectory. Join our dynamic team and contribute to the platform that drives the next generation of AI products.THE ROLEWe are looking for an experienced Senior GPU Kernel Engineer to join our innovative team at the forefront of AI acceleration. In this role, your programming expertise will directly enhance the performance of cutting-edge machine learning models. You'll be responsible for developing highly efficient GPU kernels that optimize computational processes, allowing for transformative AI applications.You'll thrive in a fast-paced, intellectually challenging environment where your technical skills are pivotal. Your contributions will directly affect production systems that serve millions of users across various platforms. This position offers exceptional opportunities for career advancement for engineers enthusiastic about low-level optimization and impactful systems engineering.EXAMPLE INITIATIVESAs part of our Model Performance team, you will engage in projects like:Baseten Embeddings Inference: The quickest embeddings solution availableThe Baseten Inference StackEnhancing model performance optimizationRESPONSIBILITIESCore Engineering ResponsibilitiesDesign and develop high-performance GPU kernels for essential machine learning operations, including matrix multiplications and attention mechanisms.Collaborate with cross-functional teams to drive performance improvements and implement optimizations.Debug and refine kernel code to achieve maximal efficiency and reliability.Stay abreast of the latest advancements in GPU technology and machine learning frameworks.

Jul 17, 2025
Apply
companyKernel logo
Full-time|On-site|San Francisco

About KernelKernel is a cutting-edge developer platform that offers Lightning-Fast Browsers-as-a-Service for browser automations and web agents. Our API and MCP server empower developers to effortlessly launch browsers in the cloud without the hassle of managing infrastructure.Our serverless browser platform takes care of the complex aspects: autoscaling reliable browser infrastructure, observability, and intricate web interactions, enabling developers to concentrate on the functionality of their agents rather than the underlying details. Kernel transforms AI into a tangible, practical, and powerful tool, allowing developers to deploy agents capable of genuine interaction with the digital landscape.We pride ourselves on being trusted by teams at Cash App, Rye, and numerous others for deep research, QA automation, and real-time web analysis. We have successfully secured $22M in funding from top investors including Accel, YCombinator, Vercel, Paul Graham, Solomon Hykes (Docker), David Cramer (Sentry), Charlie Marsh (Astral), and more.With just one line of code, you can deploy any web agent to our cloud. The rest is in your hands. If you are passionate about building essential infrastructure for the next wave of AI applications, we would love to hear from you.About the RoleAs a Product Engineer at Kernel, you will be a full-stack engineer who values product development as much as coding. You possess the ability to translate your strong product instincts into code, ranging from pixel-perfect UI decisions to backend API architecture. You proactively contribute to the specification process rather than waiting for one to be provided.You will collaborate closely with our co-founders to define product direction, deliver full-stack features from end to end, and ensure that Kernel maintains its polished yet powerful appearance.Your ResponsibilitiesLead the full-stack implementation of user-facing product surfaces: dashboard, onboarding, website, and core product functionalities.Influence the product roadmap by integrating customer feedback, analyzing usage patterns, and leveraging your own insights into developer needs.Enhance developer experience across our SDK, documentation, CLI, and API, delivering the kind of seamless experience that makes developers exclaim, 'this just works.'Rapidly prototype and iterate, bringing features from concept to production with minimal oversight.Help shape the standards for building a superior developer product at Kernel.Your QualificationsYou are comfortable taking ownership of features from frontend to backend, demonstrating a holistic understanding of product development.A strong passion for creating seamless user experiences and an ability to translate product vision into functional code.Experience working in a fast-paced environment with a focus on agile methodologies.

Feb 27, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our Team At OpenAI, our Scaling team is dedicated to developing and fine-tuning large-scale infrastructure that empowers the next generation of AI workloads. We are passionate about pushing the limits of technology to create impactful AI systems that benefit everyone.Role Overview We are seeking a pioneering Lead Linux Kernel Developer to join our Scaling team. In this pivotal role, you will architect and implement Linux kernel components, bridging the gap between hardware and software to enhance performance and scalability for our advanced AI initiatives.Key ResponsibilitiesSpearhead the development of our Linux kernel stack tailored for high-performance systems.Design and create kernel drivers, focusing on areas such as DMA, PCIe, NICs, and RDMA.Oversee the full development cycle of system-scale networking, including essential kernel and low-level software components.Collaborate with technology vendors to effectively integrate their solutions into our systems.Conduct kernel bring-up and debugging on new hardware platforms.Develop userspace software to facilitate integration, testing, diagnostics, and performance validation.Required QualificationsDemonstrated experience in leading Linux kernel development projects.In-depth knowledge of key subsystems for high-performance systems such as PCIe, dma-buf, RDMA, P2P, SR-IOV, and IOMMU.Familiarity with subsystems and frameworks relevant to scalable networking, including ibverbs and ECN/DCQCN.Expertise in programming languages such as C, C++, Python, and Linux shell scripting; experience with Rust is highly desirable.Proven ability to collaborate with engineering teams to define interfaces and develop tooling.Successful history of managing vendor relationships and deliverables.Background in embedded systems development, including bootloaders, drivers, and hardware/software integration.Ability to navigate ambiguity and construct systems from the ground up.Note: To comply with U.S. export control laws, candidates for this position may need to meet specific legal status requirements.

Aug 27, 2025
Apply
companySentry logo
Full-time|On-site|San Francisco, California

Join our dynamic team at Sentry as a Staff Software Engineer specializing in AI Developer Tooling. In this pivotal role, you will contribute to the development and enhancement of cutting-edge AI tools that empower developers worldwide. Collaborate with cross-functional teams to identify challenges and deliver innovative solutions that optimize the software development lifecycle.

Apr 8, 2026
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, our mission is to enhance human potential through the advancement of collaborative general intelligence. We aspire to create a future where everyone can harness the knowledge and tools necessary to utilize AI for their unique objectives.We are a diverse team of scientists, engineers, and builders, responsible for developing some of the most utilized AI products, including ChatGPT and Character.ai, alongside open-weight models like Mistral. Our contributions extend to popular open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are in search of a Developer Productivity Engineer who will play a pivotal role in transforming our software development process to be safer, faster, and more enjoyable. Your primary focus will be on AI tools and coding agents, collaborating closely with platform, security, and product engineers to craft state-of-the-art tools for AI-assisted software development, significantly speeding up our internal processes.This role encompasses establishing company-wide platforms and working directly with developers to enhance their individual workflows.Note: This is an evergreen role that remains open for ongoing interest. While we receive numerous applications, there may not always be an immediate match for your skills and experience. We encourage you to apply as we continuously review submissions and reach out as new opportunities arise. You are welcome to reapply after gaining more experience, but please refrain from applying more than once every six months. Occasionally, we post specific roles for targeted projects or teams; in those cases, feel free to apply directly alongside this evergreen listing.What You’ll DoEmpower researchers and engineers to leverage AI for improved coding productivity while maintaining high code quality.Standardize AI coding tools such as Claude Code, Cursor, and Codex, helping to configure, secure, and maintain these tools while balancing organization-wide settings with individual developer preferences.Create secure, reproducible agent sandboxes for remote development and continuous integration testing.Establish golden-path development environments and safeguard measures for sensitive information.Assist individual contributors in crafting their personalized AI-enabled workflows.Monitor tool usage, reliability, and cost.

Feb 5, 2026
Apply
companyGimlet Labs logo
Full-time|On-site|San Francisco

At Gimlet Labs, we are pioneering the first heterogeneous neocloud tailored for AI workloads. As the demand for AI systems grows, traditional infrastructure faces significant limitations in terms of power, capacity, and cost. Our innovative platform addresses these challenges by decoupling AI workloads from the hardware, intelligently partitioning tasks, and directing each component to the most suitable hardware for optimal performance and efficiency. This method allows for the creation of heterogeneous systems that span multiple vendors and generations of hardware, including the latest cutting-edge accelerators, achieving substantial improvements in performance and cost-effectiveness.Building upon this robust foundation, Gimlet is developing a production-grade neocloud designed for agentic workloads. Our customers can effortlessly deploy and manage their workloads with stable, production-ready APIs, eliminating the complexities of hardware selection, placement, or low-level performance optimization.We collaborate with foundational labs, hyperscalers, and AI-native companies to drive real production workloads capable of scaling to gigawatt-class AI data centers.We are currently seeking a dedicated Member of Technical Staff specializing in kernels and GPU performance. In this role, you will work closely with accelerators and execution hardware to extract maximum performance from AI workloads across diverse and rapidly evolving platforms. You will analyze low-level execution behaviors, design and optimize kernels, and ensure consistent performance across both established and emerging hardware.This position is perfect for engineers who thrive on deep performance analysis, enjoy exploring hardware trade-offs, and are passionate about transforming theoretical peak performance into tangible real-world outcomes.

Mar 10, 2026
Apply
companyAirwallex logo
Full-time|On-site|US - San Francisco

Join Airwallex as a Senior Software Engineer - AI ToolsAre you ready to make a significant impact in the world of global finance? At Airwallex, we are the leading unified payments and financial platform for businesses around the globe. Our innovative solutions empower over 200,000 businesses, including industry giants like Brex, Qantas, and SHEIN, to seamlessly manage everything from accounts to treasury operations.Founded in Melbourne and now boasting a team of over 2,000 talented individuals across 26 offices worldwide, we are valued at $8 billion and supported by top-tier investors such as T. Rowe Price and Visa. If you're eager to take on challenging projects and grow your career, we want you on our team.What We ValueWe seek dynamic builders with a passion for innovation and a desire to take ownership of their work. You should bring a wealth of expertise and a sharp, analytical mindset, driven by our mission and operating principles. Your ability to make quick yet sound decisions while exploring new avenues will be paramount.As a collaborative team player, you will turn groundbreaking ideas into tangible products and drive our projects to completion. Leveraging AI will be key to working efficiently and solving problems swiftly. Join us in addressing complex challenges with exceptional colleagues while advancing your career in the future of global banking.Your RoleAs a Senior Software Engineer focusing on AI tools, your mission will be to maintain the security of over $150 billion in payments across 150,000 companies and thousands of employees.You will collaborate with internal teams to identify impactful projects, such as developing automation tools, managing our extensive log pipelines, and employing AI to enhance team operations. We value versatility and intelligent problem-solving over specific technologies, so you'll have the freedom to choose the best approaches. Additionally, you'll play a crucial role in mentoring junior engineers and contributing to the growth of our exceptional engineering team.

Nov 2, 2025
Apply
companyKernel logo
Full-time|On-site|San Francisco

Join Our Team at KernelAt Kernel, we are revolutionizing the way developers interact with the digital world through our innovative platform, offering Lightning-Fast Browsers-as-a-Service for seamless browser automation and advanced web agents. Our cutting-edge API and MCP server empower developers to effortlessly launch browsers in the cloud, eliminating the complexities of infrastructure management.Our serverless browser platform takes the hassle out of autoscaling, reliability, and observability, allowing developers to concentrate on their agents' functionality rather than the underlying processes. Kernel transforms AI into a practical and impactful tool, enabling developers to deploy agents that can genuinely engage with online environments.Trusted by industry leaders such as Cash App and Rye for applications ranging from comprehensive research to QA automation and real-time web analysis, we have successfully raised $22M from prominent investors including Accel, YCombinator, and others.With just one line of code, any web agent can be deployed to our cloud—what happens next is up to you. If you are passionate about creating essential infrastructure for the future of AI applications, we would love to connect.

Dec 4, 2025
Apply
companyScale AI logo
Full-time|$216.2K/yr - $270.3K/yr|On-site|San Francisco, CA

Join Scale AI as a Senior Software Engineer where you will play a crucial role in designing, developing, and managing secure and scalable infrastructure that empowers our team to excel in their work. You will collaborate with an innovative, agile, and solution-focused team dedicated to automating processes in identity and access management, endpoint management, and our extensive SaaS ecosystem. The right candidate will harness a variety of internal and external platforms to create applications, Slackbots, and dashboards for internal users while optimizing intricate workflows through automation. Your contributions will significantly enhance the operational efficiency of all General and Administrative teams within the company.Your Impact:As a full-stack engineer (60% backend / 40% frontend), you’ll collaborate with your team to create exceptional tools and applications for both internal use and external partners.You will be responsible for designing, developing, testing, and supporting full-stack applications on cloud-native distributed systems.You will establish real-time integrations with SaaS platforms across the organization.Develop a comprehensive quality framework/unit tests to ensure product performance, quality, and load handling, while effectively debugging and identifying system issues.Engage with broader teams, participate in engineering councils, conduct code reviews, and enhance our delivery processes.Work closely with the IAM team to manage cloud-based identity (Okta) and access controls, ensuring adherence to security protocols and standards for internal applications.Collaborate cross-functionally to identify opportunities for technological improvements and implement self-service solutions.

Mar 26, 2026
Apply
companyKernel logo
Full-time|On-site|San Francisco

About KernelKernel is an innovative developer platform that delivers Lightning-Fast Browsers-as-a-Service for browser automation and web agent deployment. Our API and MCP server empower developers to effortlessly launch cloud-based browsers without the hassle of infrastructure management.Our serverless browser solution takes care of the complexities: autoscaling, dependable browser infrastructure, observability, and intricate web interactions, allowing developers to concentrate on their agents' functionality rather than the underlying technology. Kernel brings AI to life, enabling developers to create agents that genuinely engage with the digital landscape.Our platform is trusted by teams at Cash App, Rye, and many others for various tasks including in-depth research, QA automation, and real-time web analysis. We recently secured $22M in funding from notable investors such as Accel, YCombinator, Vercel, Paul Graham, Solomon Hykes (Docker), David Cramer (Sentry), and Charlie Marsh (Astral).With just a single line of code, you can deploy any web agent to our cloud infrastructure. If you are passionate about developing essential infrastructure for the future of AI applications, we would love to connect with you.

Dec 4, 2025
Apply
companyKernel logo
Full-time|On-site|San Francisco

About KernelKernel is a cutting-edge developer platform that offers Lightning-Fast Browsers-as-a-Service tailored for browser automation and web agent creation. Our API and MCP server enable developers to seamlessly launch browsers in the cloud without the hassle of infrastructure management.Our serverless browser platform takes care of the complex tasks: autoscaling reliable browser infrastructure, ensuring observability, and managing the intricate details of web interactions, allowing developers to concentrate on their agent functionalities rather than the underlying processes. Kernel brings AI to life, making it practical and powerful, empowering developers to deploy agents that can effectively engage with the digital landscape.We are trusted by teams at Cash App, Rye, and numerous others for diverse applications like in-depth research, QA automation, and real-time web analysis. We have successfully secured $22M in funding from notable investors including Accel, YCombinator, Vercel, Paul Graham, Solomon Hykes (Docker), David Cramer (Sentry), Charlie Marsh (Astral), among others.With just one line of code, you can deploy any web agent to our cloud. The rest is in your hands. If you're passionate about developing critical infrastructure for the next generation of AI applications, we would love to connect.

Dec 4, 2025
Apply
company
Full-time|On-site|San Francisco

At Magic, our goal is to develop safe AGI that propels humanity forward by addressing some of the most pressing challenges we face. We are committed to harnessing the power of automated research and code generation to enhance models and improve alignment in ways that surpass human capabilities. Our innovative methodology integrates cutting-edge pre-training, domain-specific reinforcement learning, ultra-long context, and advanced inference-time computing.Role OverviewAs a Kernel Engineer, you will be responsible for the design, implementation, and maintenance of high-performance kernels, aiming to optimize throughput and minimize latency during both training and inference processes.Magic's extended context windows present unique kernel optimization challenges, particularly regarding memory efficiency, data movement, and sustained throughput.Key ResponsibilitiesDesign and develop kernels that facilitate high-performance long-context functionality.Take ownership of kernel design, implementation, deployment, and ensure production reliability.Emphasize robustness, thorough testing, and functional accuracy while striving for optimal performance.Assess the feasibility of porting Magic’s compute kernels to various hardware platforms.Collaborate with the training, inference, and reinforcement learning teams to co-design kernels.Explore our work through the Magic-Attention, presented at GTC 2026.QualificationsExperience in low-level programming for AI accelerators, including NVIDIA Blackwell or Google TPUs.Proficient in developing and optimizing GPU kernels using frameworks such as NCCL, MSCCLPP, CUTLASS, CuTeDSL, Triton, Quack, and Flash Attention.

Jan 24, 2024
Apply
companyDatabricks logo
Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California

P-97 At Databricks, we are dedicated to empowering data teams to tackle some of the most challenging problems in the world. We achieve this by creating and managing a leading data and AI infrastructure platform that enables our clients to leverage deep data insights for business enhancement. Our commitment to pushing the limits of data and AI technology is matched by our focus on resilience, security, and scalability, which are essential for our customers' success on our platform. Databricks operates one of the largest-scale software platforms, comprising millions of virtual machines that generate terabytes of logs and process exabytes of data daily. Given our scale, we frequently encounter cloud hardware, network, and operating system faults, and our software must adeptly protect our customers from these issues. As a Senior Performance Engineer, you will collaborate with various teams throughout the organization to assess product and feature performance, pinpoint performance bottlenecks, and partner with engineers to address performance and scalability challenges. This includes setting performance goals for different software releases, guiding teams in developing performance benchmarks, conducting competitive benchmark analyses for various Databricks products, and performing in-depth analyses to identify and resolve performance issues.

Jan 30, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

OpenAI is seeking a Software Engineer in San Francisco to focus on improving productivity by optimizing model performance. This position centers on developing solutions that make machine learning models more efficient and effective. Role overview This role involves working closely with teams across different functions to identify and address areas where model performance can be improved. The aim is to deliver changes that have a measurable impact on both systems and workflows. What you will do Collaborate with engineers and other specialists to enhance model efficiency Develop and implement solutions that improve the effectiveness of machine learning systems Contribute to projects that streamline processes and drive productivity gains Impact Your work will help shape improvements in how models operate and how teams at OpenAI achieve their goals. The changes you help deliver will support more effective use of resources and better outcomes for the organization.

Apr 29, 2026
Apply
companyBaseten logo
Full-time|On-site|San Francisco

ABOUT BASETENBaseten is at the forefront of AI technology, empowering leading-edge companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer to seamlessly integrate advanced AI models into their operations. Our unique blend of applied AI research, adaptable infrastructure, and intuitive developer tools enables innovators to bring their most ambitious AI products to life. With our recent $300M Series E funding from top-tier investors such as BOND, IVP, Spark Capital, Greylock, and Conviction, we are poised for rapid growth. Join us in shaping the platform that engineers rely on to deploy transformative AI solutions.THE ROLEAre you driven by a passion for enhancing artificial intelligence applications? We are seeking a proactive Software Engineer specializing in ML performance to join our energetic team. This position is perfect for backend engineers who thrive in a fast-paced startup environment and are eager to make substantial contributions to the realm of Large Language Model (LLM) Inference. If you're enthusiastic about optimizing open-source ML models, we can't wait to hear from you!EXAMPLE INITIATIVESAs a member of our Model Performance team, you will have the opportunity to work on exciting projects, including:Baseten Embeddings Inference: The quickest embeddings solution availableThe Baseten Inference StackDriving model performance optimizationRESPONSIBILITIESDevelop, refine, and implement advanced techniques (quantization, speculative decoding, kv cache reuse, chunked prefill, and LoRA) for ML model inference and infrastructure.Conduct thorough investigations into the codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and other libraries to troubleshoot and resolve ML performance issues.Scale and apply optimization techniques across a diverse array of ML models, with a focus on large language models.

Mar 28, 2024

Sign in to browse more jobs

Create account — see all 7,643 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.