CartaSan Francisco, CA; Santa Clara, CA; Seattle, WA; New York, NY
Hybrid Full-time
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Mid to Senior
Qualifications
Proven experience in software development, preferably in a leadership role. Strong proficiency in programming languages such as Java, Python, or JavaScript. Experience with cloud technologies and microservices architecture. Excellent problem-solving skills and ability to work collaboratively. Strong understanding of algorithms and data structures. Experience in mentoring junior developers.
About the job
Join Carta's engineering team as a Staff Software Engineer, where you will play a crucial role in developing innovative solutions that enhance our platform. You will collaborate with cross-functional teams to design, implement, and maintain scalable systems, ensuring high performance and responsiveness to requests from the front-end.
We're looking for a passionate engineer who thrives in a fast-paced environment and is excited about tackling complex challenges. If you are eager to contribute to cutting-edge technology and drive impactful projects, we want to hear from you!
About Carta
Carta is a leading technology company focused on transforming the way equity is managed. Our mission is to create a more equitable future by providing innovative software solutions that empower businesses and their employees. We pride ourselves on our collaborative culture and commitment to professional growth.
Similar jobs
1 - 20 of 5,866 Jobs
Search for Staff Software Engineer Genai Performance And Kernel
Full-time|$190.9K/yr - $232.8K/yr|On-site|San Francisco, California
P-1285 About This Role Join our dynamic team at Databricks as a Staff Software Engineer specializing in GenAI Performance and Kernel. In this pivotal role, you will take charge of designing, implementing, and optimizing high-performance GPU kernels that drive our GenAI inference stack. Your expertise will lead the development of finely-tuned, low-level compute paths, balancing hardware efficiency with versatility, while mentoring fellow engineers in the intricacies of kernel-level performance engineering. Collaborating closely with machine learning researchers, systems engineers, and product teams, you will elevate the forefront of inference performance at scale. What You Will Do Lead the design, implementation, benchmarking, and maintenance of essential compute kernels (such as attention, MLP, softmax, layernorm, memory management) tailored for diverse hardware backends (GPU, accelerators). Steer the performance roadmap for kernel-level enhancements, focusing on areas like vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, and auto-tuning. Integrate kernel optimizations seamlessly with higher-level machine learning systems. Develop and uphold profiling, instrumentation, and verification tools to identify correctness, performance regressions, numerical discrepancies, and hardware utilization inefficiencies. Conduct performance investigations and root-cause analyses to address inference bottlenecks, such as memory bandwidth, cache contention, kernel launch overhead, and tensor fragmentation. Create coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross-backend compatibility, and maintainability. Influence architectural decisions to enhance kernel efficiency (including memory layout, dataflow scheduling, and kernel fusion boundaries). Guide and mentor fellow engineers focused on lower-level performance, conducting code reviews and establishing best practices. Collaborate with infrastructure, tooling, and machine learning teams to implement kernel-level optimizations in production and assess their impacts.
Full-time|$200K/yr - $400K/yr|Remote|San Francisco
At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, significantly enhancing the speed and reducing the cost of AI inference. Our founders, the visionaries behind vLLM, have spent years bridging the gap between advanced models and cutting-edge hardware.About the RoleWe are seeking a skilled performance engineer dedicated to maximizing the computational efficiency of modern accelerators. In this role, you'll develop kernels and implement low-level optimizations that position vLLM as the fastest inference engine globally. Your contributions will be pivotal as your code will execute across a broad spectrum of hardware accelerators, from NVIDIA GPUs to the latest silicon innovations. You'll collaborate closely with hardware vendors to ensure we fully leverage the capabilities of each new generation of chips.
Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California
At Databricks, we are dedicated to empowering data teams to tackle the world's most challenging problems, from detecting security threats to advancing cancer drug development. We achieve this by offering the premier data and AI platform, allowing our customers to concentrate on their mission-critical challenges. The Mosaic AI organization assists companies in developing AI models and systems utilizing their own data, employing technologies that range from training large language models (LLMs) from the ground up to employing advanced retrieval methods for enhanced generation. We pride ourselves on pushing the boundaries of science and operationalizing our innovations. Mosaic AI believes that a company’s AI models hold intrinsic value, akin to any other core intellectual property, and that superior AI models should be accessible to all. Job Overview As a research engineer in the Scaling team, you will stay abreast of the latest advancements in deep learning and pioneer new methodologies that surpass the current state of the art. You will collaborate with a diverse team of researchers and engineers, sharing insights and expertise. Most importantly, you will be passionate about our customers, striving to ensure their success in implementing cutting-edge LLMs and AI systems by translating our scientific knowledge into practical applications. Your Impact Enhance performance through innovative optimization techniques, including kernel fusion, mixed precision, memory layout optimization, tiling strategies, and tensorization tailored for training-specific patterns. Design, implement, and optimize high-performance GPU kernels for training workloads, including attention mechanisms, custom layers, gradient computations, and activation functions, specifically for NVIDIA architectures. Create and implement distributed training frameworks for large language models, incorporating parallelism strategies (data, tensor, pipeline, ZeRO-based) and optimized communication patterns for gradient synchronization and collective operations. Profile, debug, and optimize comprehensive training workflows to pinpoint and resolve performance bottlenecks, utilizing memory optimization techniques such as activation checkpointing, gradient sharding, and mixed precision training.
About the Role OpenAI is looking for a Software Engineer specializing in Kernel Performance and AI Tooling to join the team in San Francisco. This role centers on improving software systems for maximum efficiency and building advanced tools that support AI development. What You Will Do Optimize kernel-level performance across OpenAI's software stack. Design and implement tools that accelerate AI research and deployment. Work closely with engineers to identify bottlenecks and deliver practical solutions. Contribute to technical discussions and share knowledge with teammates. Team and Collaboration Work alongside engineers who are committed to advancing AI technology. Collaboration and innovation are central to the team’s approach.
Full-time|$190.9K/yr - $232.8K/yr|On-site|San Francisco, California
P-1285 About This Role Join Databricks as a Staff Software Engineer specializing in GenAI inference, where you will spearhead the architecture, development, and optimization of the inference engine that powers the Databricks Foundation Model API. Your role will be crucial in bridging cutting-edge research with real-world production requirements, ensuring exceptional throughput, minimal latency, and scalable solutions. You will work across the entire GenAI inference stack, including kernels, runtimes, orchestration, memory management, and integration with various frameworks and orchestration systems. What You Will Do Take full ownership of the architecture, design, and implementation of the inference engine, collaborating on a model-serving stack optimized for large-scale LLM inference. Work closely with researchers to integrate new model architectures or features, such as sparsity, activation compression, and mixture-of-experts into the engine. Lead comprehensive optimization efforts focused on latency, throughput, memory efficiency, and hardware utilization across GPUs and other accelerators. Establish and uphold standards for building and maintaining instrumentation, profiling, and tracing tools to identify performance bottlenecks and drive optimizations. Design scalable solutions for routing, batching, scheduling, memory management, and dynamic loading tailored to inference workloads. Guarantee reliability, reproducibility, and fault tolerance in inference pipelines, including capabilities for A/B testing, rollbacks, and model versioning. Collaborate cross-functionally to integrate with federated and distributed inference infrastructure, ensuring effective orchestration across nodes, load balancing, and minimizing communication overhead. Foster collaboration with cross-functional teams, including platform engineers, cloud infrastructure, and security/compliance professionals. Represent the team externally through benchmarks, whitepapers, and contributions to open-source projects. What We Look For A BS/MS/PhD in Computer Science or a related discipline. A solid software engineering background with 6+ years of experience in performance-critical systems. A proven ability to own complex system components and influence architectural decisions from conception to execution. A deep understanding of ML inference internals, including attention mechanisms, MLPs, recurrent modules, quantization, and sparse operations. Hands-on experience with CUDA, GPU programming, and essential libraries (cuBLAS, cuDNN, NCCL, etc.). A strong foundation in distributed systems design, including RPC frameworks, queuing, RPC batching, sharding, and memory partitioning. Demonstrated proficiency in diagnosing and resolving performance bottlenecks across multiple layers (kernel, memory, networking, scheduler).
At Gimlet Labs, we are pioneering the first heterogeneous neocloud tailored for AI workloads. As the demand for AI systems grows, traditional infrastructure faces significant limitations in terms of power, capacity, and cost. Our innovative platform addresses these challenges by decoupling AI workloads from the hardware, intelligently partitioning tasks, and directing each component to the most suitable hardware for optimal performance and efficiency. This method allows for the creation of heterogeneous systems that span multiple vendors and generations of hardware, including the latest cutting-edge accelerators, achieving substantial improvements in performance and cost-effectiveness.Building upon this robust foundation, Gimlet is developing a production-grade neocloud designed for agentic workloads. Our customers can effortlessly deploy and manage their workloads with stable, production-ready APIs, eliminating the complexities of hardware selection, placement, or low-level performance optimization.We collaborate with foundational labs, hyperscalers, and AI-native companies to drive real production workloads capable of scaling to gigawatt-class AI data centers.We are currently seeking a dedicated Member of Technical Staff specializing in kernels and GPU performance. In this role, you will work closely with accelerators and execution hardware to extract maximum performance from AI workloads across diverse and rapidly evolving platforms. You will analyze low-level execution behaviors, design and optimize kernels, and ensure consistent performance across both established and emerging hardware.This position is perfect for engineers who thrive on deep performance analysis, enjoy exploring hardware trade-offs, and are passionate about transforming theoretical peak performance into tangible real-world outcomes.
Join Zyphra as a Research Engineer specializing in AI Performance and Kernel Optimization. In this role, you will work at the forefront of AI technologies, developing and optimizing kernel solutions that enhance the performance of our systems. You will collaborate with cross-functional teams, leveraging your expertise to drive innovation and efficiency.
At Sciforium, we are at the forefront of AI infrastructure, innovating next-generation multimodal AI models and a proprietary high-efficiency serving platform. With substantial funding and direct collaboration from AMD, supported by their engineers, our team is rapidly expanding to develop the complete stack that powers cutting-edge AI models and real-time applications.About the RoleWe are on the lookout for a talented GPU Kernel Engineer who is eager to explore and maximize performance on modern accelerators. In this role, you will be responsible for designing and optimizing custom GPU kernels that drive our advanced large-scale AI systems. You will navigate the hardware-software stack, engaging in low-level kernel development and integrating optimized operations into high-level machine learning frameworks for large-scale training and inference.This position is perfect for someone who excels at the intersection of GPU programming, systems engineering, and state-of-the-art AI workloads, and aims to contribute significantly to the efficiency and scalability of our machine learning platform.Key ResponsibilitiesDevelop, implement, and enhance custom GPU kernels utilizing C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas.Profile and fine-tune the end-to-end performance of machine learning operations, particularly for large-scale LLM training and inference.Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and our proprietary internal runtimes.Create performance models, pinpoint bottlenecks, and deliver kernel-level enhancements that significantly boost AI workloads.Collaborate with machine learning researchers, distributed systems engineers, and model-serving teams to optimize computational performance across the entire stack.Engage closely with hardware vendors (NVIDIA/AMD) and stay updated on the latest GPU architecture and compiler/toolchain advancements.Contribute to the development of tools, documentation, benchmarking suites, and testing frameworks ensuring correctness and performance reproducibility.Must-Haves5+ years of industry or research experience in GPU kernel development or high-performance computing.Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related discipline.Strong programming proficiency in C++, Python, and familiarity with machine learning frameworks.
ABOUT BASETENAt Baseten, we empower the world's leading AI firms—such as Cursor, Notion, and OpenEvidence—by delivering mission-critical inference solutions. Our unique blend of applied AI research, robust infrastructure, and user-friendly developer tools enables AI pioneers to effectively deploy groundbreaking models. With our recent achievement of a $300M Series E funding round supported by esteemed investors like BOND and IVP, we're on an exciting growth trajectory. Join our dynamic team and contribute to the platform that drives the next generation of AI products.THE ROLEWe are looking for an experienced Senior GPU Kernel Engineer to join our innovative team at the forefront of AI acceleration. In this role, your programming expertise will directly enhance the performance of cutting-edge machine learning models. You'll be responsible for developing highly efficient GPU kernels that optimize computational processes, allowing for transformative AI applications.You'll thrive in a fast-paced, intellectually challenging environment where your technical skills are pivotal. Your contributions will directly affect production systems that serve millions of users across various platforms. This position offers exceptional opportunities for career advancement for engineers enthusiastic about low-level optimization and impactful systems engineering.EXAMPLE INITIATIVESAs part of our Model Performance team, you will engage in projects like:Baseten Embeddings Inference: The quickest embeddings solution availableThe Baseten Inference StackEnhancing model performance optimizationRESPONSIBILITIESCore Engineering ResponsibilitiesDesign and develop high-performance GPU kernels for essential machine learning operations, including matrix multiplications and attention mechanisms.Collaborate with cross-functional teams to drive performance improvements and implement optimizations.Debug and refine kernel code to achieve maximal efficiency and reliability.Stay abreast of the latest advancements in GPU technology and machine learning frameworks.
About Our Team At OpenAI, our Scaling team is dedicated to developing and fine-tuning large-scale infrastructure that empowers the next generation of AI workloads. We are passionate about pushing the limits of technology to create impactful AI systems that benefit everyone.Role Overview We are seeking a pioneering Lead Linux Kernel Developer to join our Scaling team. In this pivotal role, you will architect and implement Linux kernel components, bridging the gap between hardware and software to enhance performance and scalability for our advanced AI initiatives.Key ResponsibilitiesSpearhead the development of our Linux kernel stack tailored for high-performance systems.Design and create kernel drivers, focusing on areas such as DMA, PCIe, NICs, and RDMA.Oversee the full development cycle of system-scale networking, including essential kernel and low-level software components.Collaborate with technology vendors to effectively integrate their solutions into our systems.Conduct kernel bring-up and debugging on new hardware platforms.Develop userspace software to facilitate integration, testing, diagnostics, and performance validation.Required QualificationsDemonstrated experience in leading Linux kernel development projects.In-depth knowledge of key subsystems for high-performance systems such as PCIe, dma-buf, RDMA, P2P, SR-IOV, and IOMMU.Familiarity with subsystems and frameworks relevant to scalable networking, including ibverbs and ECN/DCQCN.Expertise in programming languages such as C, C++, Python, and Linux shell scripting; experience with Rust is highly desirable.Proven ability to collaborate with engineering teams to define interfaces and develop tooling.Successful history of managing vendor relationships and deliverables.Background in embedded systems development, including bootloaders, drivers, and hardware/software integration.Ability to navigate ambiguity and construct systems from the ground up.Note: To comply with U.S. export control laws, candidates for this position may need to meet specific legal status requirements.
Full-time|$180K/yr - $250K/yr|On-site|San Francisco
Join fal in our pursuit to maintain a leading edge in model performance for generative media models. You'll be instrumental in designing and implementing innovative solutions for model serving architecture, built on our proprietary inference engine. Your focus will be on maximizing throughput while minimizing latency and resource consumption. In addition, you will create performance monitoring and profiling tools to identify bottlenecks and optimization opportunities. Collaborate closely with our Applied ML team and clients in the media sector to ensure their workloads leverage our accelerator effectively.
Full-time|$142.2K/yr - $204.6K/yr|On-site|San Francisco, California
About This Role Join Databricks as a Software Engineer focused on GenAI inference, where you will play a pivotal role in designing, developing, and enhancing the inference engine that drives our Foundation Model API. Collaborating at the intersection of research and production, you will ensure our large language model (LLM) serving systems are optimized for speed, scalability, and efficiency. Your contributions will span the entire GenAI inference stack, from kernels and runtimes to orchestration and memory management. What You Will Do Participate in the design and implementation of the inference engine, collaborating on a model-serving stack tailored for large-scale LLM inference. Work closely with researchers to integrate new model architectures or features such as sparsity, activation compression, and mixture-of-experts into the engine. Optimize latency, throughput, memory efficiency, and hardware utilization across GPUs and other accelerators. Build and maintain tools for instrumentation, profiling, and tracing to identify bottlenecks and inform optimization efforts. Develop scalable routing, batching, scheduling, memory management, and dynamic loading mechanisms for inference workloads. Ensure reliability, reproducibility, and fault tolerance in inference pipelines, including A/B launches, rollback, and model versioning. Integrate with federated and distributed inference infrastructure, orchestrating across nodes, balancing load, and managing communication overhead. Engage in cross-functional collaboration with platform engineers, cloud infrastructure, and security/compliance teams. Document and share insights, contributing to internal best practices and open-source initiatives as appropriate.
About GridwareGridware is an innovative technology firm based in San Francisco, committed to safeguarding and optimizing the electrical grid. We have pioneered a revolutionary grid management approach known as Active Grid Response (AGR), which emphasizes the monitoring of electrical, physical, and environmental factors that influence grid reliability and safety. Our cutting-edge AGR platform leverages high-precision sensors to identify potential issues early, facilitating proactive maintenance and fault prevention. This holistic strategy aids in enhancing safety, minimizing outages, and ensuring the grid operates with maximum efficiency. Gridware is supported by prominent climate-tech and Silicon Valley investors. For further details, please visit www.Gridware.io.Role OverviewWe are looking for a talented Staff Software Engineer to act as a pivotal technical force within our team, enhancing the overall software engineering capabilities through architectural innovation, mentorship, and fostering a culture of excellence. In this role, you will design and develop the essential software systems that drive Gridware's platform. This encompasses everything from backend services that oversee our distributed network of devices to the front-end interfaces that visualize grid health, fleet diagnostics, and real-time field events.Your responsibilities will span the entire technology stack, building and scaling systems that integrate hardware, firmware, and cloud infrastructure to enable dependable communication, fleet visibility, and expedited decision-making. This position offers significant ownership and impact, allowing you to influence how our technology supports and protects critical infrastructure at scale.
About BroccoliBroccoli is revolutionizing the $500 billion home services industry by developing an AI operating system designed to empower trades businesses such as HVAC and roofing. Our intelligent AI agents handle customer interactions, manage job bookings, and ensure every lead is effectively captured.With the backing of prominent venture capital firms and a successful $27 million Series A funding round, we are on an aggressive growth trajectory. Collaborating with top private equity-backed home service platforms, we anticipate expanding our team fivefold by 2026, presenting a unique opportunity to join us early and make a significant impact.Why Join Broccoli?As a Staff Engineer, you will be instrumental in establishing the technical backbone of Broccoli AI. Your responsibilities will include ownership of critical systems, influencing architectural decisions, and shaping our development and deployment processes on a large scale.Immediate Impact: Your contributions will directly enhance production systems, benefiting hundreds of customers.Category Creation: Play a pivotal role in defining a new category of AI-powered workforce within an expansive market.Speed & Ownership: Enjoy the advantages of a small team with rapid feedback loops and substantial decision-making authority.Founder Collaboration: Partner closely with experienced founders to drive product and technical vision.What You’ll DoDesign, develop, and scale backend systems and internal tools for our AI agent platform.Take ownership of essential APIs and integrations, including systems like ServiceTitan.Lead complex features from initial design through to production deployment.Enhance real-time voice capabilities, reliability, and intelligence of AI agents.Mentor fellow engineers and help implement best practices across the team.Balance speed and quality while scaling systems to accommodate live customer traffic.What We’re Looking For7+ years of experience in backend or full-stack engineering.Strong system design and architectural skills.Proven experience in deploying and maintaining production systems at scale.Ability to thrive in high-growth, ambiguous startup environments.A proactive approach with a strong execution mindset.
Full-time|Hybrid|San Francisco, CA; Santa Clara, CA; Seattle, WA; New York, NY
Join Carta's engineering team as a Staff Software Engineer, where you will play a crucial role in developing innovative solutions that enhance our platform. You will collaborate with cross-functional teams to design, implement, and maintain scalable systems, ensuring high performance and responsiveness to requests from the front-end.We're looking for a passionate engineer who thrives in a fast-paced environment and is excited about tackling complex challenges. If you are eager to contribute to cutting-edge technology and drive impactful projects, we want to hear from you!
Role overview The Staff Software Engineer position at Amplitude, Inc. is based in San Francisco, CA. This role centers on developing and enhancing software to broaden the platform’s features. Day-to-day work includes direct software development and frequent collaboration with colleagues from various teams. What you will do Design, build, and maintain scalable software applications that support the platform’s growth. Collaborate with product managers and designers to deliver features that address user needs. Mentor junior engineers and contribute to their technical and professional development. Review code and help improve engineering practices throughout the team. Stay current with emerging technologies and industry trends to guide technical choices.
Join Our Team at KernelAt Kernel, we are revolutionizing the way developers interact with the digital world through our innovative platform, offering Lightning-Fast Browsers-as-a-Service for seamless browser automation and advanced web agents. Our cutting-edge API and MCP server empower developers to effortlessly launch browsers in the cloud, eliminating the complexities of infrastructure management.Our serverless browser platform takes the hassle out of autoscaling, reliability, and observability, allowing developers to concentrate on their agents' functionality rather than the underlying processes. Kernel transforms AI into a practical and impactful tool, enabling developers to deploy agents that can genuinely engage with online environments.Trusted by industry leaders such as Cash App and Rye for applications ranging from comprehensive research to QA automation and real-time web analysis, we have successfully raised $22M from prominent investors including Accel, YCombinator, and others.With just one line of code, any web agent can be deployed to our cloud—what happens next is up to you. If you are passionate about creating essential infrastructure for the future of AI applications, we would love to connect.
Join Canva as a Staff Software Engineer specializing in Video Performance. In this role, you will be instrumental in enhancing our video features, ensuring top-notch performance for our users. You will collaborate with cross-functional teams, leveraging your expertise to drive innovation and optimize our video products.
Join our innovative team at Crusoe as a Staff Software Engineer. In this pivotal role, you will leverage your advanced software engineering skills to design, develop, and optimize cutting-edge solutions that enhance our technology stack. Collaborate with cross-functional teams to drive projects from concept to completion, ensuring high-quality deliverables that meet user needs and business objectives.
Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California
P-97 At Databricks, we are dedicated to empowering data teams to tackle some of the most challenging problems in the world. We achieve this by creating and managing a leading data and AI infrastructure platform that enables our clients to leverage deep data insights for business enhancement. Our commitment to pushing the limits of data and AI technology is matched by our focus on resilience, security, and scalability, which are essential for our customers' success on our platform. Databricks operates one of the largest-scale software platforms, comprising millions of virtual machines that generate terabytes of logs and process exabytes of data daily. Given our scale, we frequently encounter cloud hardware, network, and operating system faults, and our software must adeptly protect our customers from these issues. As a Senior Performance Engineer, you will collaborate with various teams throughout the organization to assess product and feature performance, pinpoint performance bottlenecks, and partner with engineers to address performance and scalability challenges. This includes setting performance goals for different software releases, guiding teams in developing performance benchmarks, conducting competitive benchmark analyses for various Databricks products, and performing in-depth analyses to identify and resolve performance issues.
Jan 30, 2026
Sign in to browse more jobs
Create account — see all 5,866 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.