Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Mid to Senior
Qualifications
Key Responsibilities:Support fal in maintaining its leading position in model performance for generative media models. Design and implement cutting-edge approaches to model serving architecture on our in-house inference engine, emphasizing throughput maximization while minimizing latency and resource use. Develop tools for performance monitoring and profiling to identify bottlenecks and areas for optimization. Work closely with our Applied ML team and media sector clients to ensure their workloads benefit from our accelerator. Requirements:Solid foundation in systems programming with a keen ability to identify and resolve bottlenecks. In-depth knowledge of advanced ML infrastructure, including technologies such as PyTorch, TensorRT, TransformerEngine, and Nsight, encompassing model compilation, quantization, and serving architectures. Strong understanding of underlying hardware (currently Nvidia-based systems), with the ability to delve deeper into the stack to fix issues, including custom GEMM kernels with CUTLASS for common shapes. Proficiency in Triton or a willingness to learn, along with comparable experience in lower-level accelerator programming. Experience with multi-dimensional model parallelism, integrating various parallelism techniques such as tensor parallelism and context/sequence parallelism. Familiarity with the internals of Ring Attention, FA3, and FusedMLP implementations.
About the job
Join fal in our pursuit to maintain a leading edge in model performance for generative media models. You'll be instrumental in designing and implementing innovative solutions for model serving architecture, built on our proprietary inference engine. Your focus will be on maximizing throughput while minimizing latency and resource consumption. In addition, you will create performance monitoring and profiling tools to identify bottlenecks and optimization opportunities. Collaborate closely with our Applied ML team and clients in the media sector to ensure their workloads leverage our accelerator effectively.
About fal
fal is at the forefront of innovation in generative media models, continually advancing our technologies to deliver exceptional model performance. We pride ourselves on fostering a collaborative environment where creative minds can thrive and contribute to groundbreaking projects.
Similar jobs
1 - 20 of 6,035 Jobs
Search for Staff Software Engineer Ml Performance Systems
Full-time|$180K/yr - $250K/yr|On-site|San Francisco
Join fal in our pursuit to maintain a leading edge in model performance for generative media models. You'll be instrumental in designing and implementing innovative solutions for model serving architecture, built on our proprietary inference engine. Your focus will be on maximizing throughput while minimizing latency and resource consumption. In addition, you will create performance monitoring and profiling tools to identify bottlenecks and optimization opportunities. Collaborate closely with our Applied ML team and clients in the media sector to ensure their workloads leverage our accelerator effectively.
About UsAt Lemurian Labs, we are dedicated to democratizing AI technology while prioritizing sustainability. Our mission is to create solutions that minimize environmental impact, ensuring that artificial intelligence serves humanity positively. We are committed to responsible innovation and the sustainable growth of AI.We are in the process of developing a state-of-the-art, portable compiler that empowers developers to 'build once, deploy anywhere.' This technology ensures seamless cross-platform integration, allowing for model training in the cloud and deployment at the edge, all while maximizing resource efficiency and scalability.If you are passionate about scaling AI sustainably and are eager to make AI development more powerful and accessible, we invite you to join our team at Lemurian Labs. Together, we can build a future that is innovative and responsible.The RoleWe are seeking a Senior ML Performance Engineer to take charge of designing and leading our Performance Testing Platform from inception. In this pivotal role, you will be recognized as the technical expert in measuring, validating, and enhancing the performance of large language models (including Llama 3.2 70B, DeepSeek, and others) prior to and following compiler optimization on cutting-edge GPU architectures.This is a critical position that will significantly impact our product quality and customer success. You will work at the intersection of Machine Learning systems, GPU architecture, and performance engineering, constructing the infrastructure that substantiates the value of our compiler.
Join fal as we revolutionize the generative-media infrastructure landscape. Our mission is to enhance model inference performance, enabling creative experiences on an unprecedented scale. We are seeking a Staff Technical Lead for Inference & ML Performance, an individual who possesses a unique blend of deep technical knowledge and strategic foresight. In this pivotal role, you will lead a talented team dedicated to building and optimizing cutting-edge inference systems. If you're ready to influence the future of inference performance in a fast-paced and rapidly growing environment, we want to hear from you.Why This Role MattersIn this role, you will play a crucial part in shaping the future of fal’s inference engine, ensuring that our generative models consistently deliver outstanding performance. Your contributions will directly affect our capacity to swiftly provide innovative creative solutions to a diverse clientele, from individual creators to global brands.Your ResponsibilitiesDefine and steer the technical direction, guiding your team across various domains including kernels, applied performance, ML compilers, and distributed inference to develop high-performance solutions.
Full-time|$190.9K/yr - $232.8K/yr|On-site|San Francisco, California
P-1285 About This Role Join our dynamic team at Databricks as a Staff Software Engineer specializing in GenAI Performance and Kernel. In this pivotal role, you will take charge of designing, implementing, and optimizing high-performance GPU kernels that drive our GenAI inference stack. Your expertise will lead the development of finely-tuned, low-level compute paths, balancing hardware efficiency with versatility, while mentoring fellow engineers in the intricacies of kernel-level performance engineering. Collaborating closely with machine learning researchers, systems engineers, and product teams, you will elevate the forefront of inference performance at scale. What You Will Do Lead the design, implementation, benchmarking, and maintenance of essential compute kernels (such as attention, MLP, softmax, layernorm, memory management) tailored for diverse hardware backends (GPU, accelerators). Steer the performance roadmap for kernel-level enhancements, focusing on areas like vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, and auto-tuning. Integrate kernel optimizations seamlessly with higher-level machine learning systems. Develop and uphold profiling, instrumentation, and verification tools to identify correctness, performance regressions, numerical discrepancies, and hardware utilization inefficiencies. Conduct performance investigations and root-cause analyses to address inference bottlenecks, such as memory bandwidth, cache contention, kernel launch overhead, and tensor fragmentation. Create coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross-backend compatibility, and maintainability. Influence architectural decisions to enhance kernel efficiency (including memory layout, dataflow scheduling, and kernel fusion boundaries). Guide and mentor fellow engineers focused on lower-level performance, conducting code reviews and establishing best practices. Collaborate with infrastructure, tooling, and machine learning teams to implement kernel-level optimizations in production and assess their impacts.
Join our innovative team at Crusoe as a Staff Software Engineer, where you will leverage your expertise in systems engineering to develop cutting-edge software solutions. In this dynamic role, you will collaborate with cross-functional teams to design, implement, and optimize systems that drive our mission forward. Your contributions will be pivotal in enhancing our technology stack and ensuring the seamless operation of our systems.
About GranicaGranica is a pioneering AI research and infrastructure company dedicated to creating reliable and steerable representations of enterprise data.We build trust through Crunch, a policy-driven health layer designed to keep extensive tabular datasets efficient, reliable, and reversible. From this foundation, we are developing Large Tabular Models—systems that learn cross-column and relational structures to provide trustworthy answers and automation, complete with built-in provenance and governance.Our MissionThe current limitations of AI are not solely due to model design but also to the inefficiencies of the data that supports it. At scale, every redundant byte, poorly organized dataset, and inefficient data path contributes to significant costs, latency, and energy waste.Granica’s mission is to eliminate these inefficiencies. We leverage cutting-edge research in information theory, probabilistic modeling, and distributed systems to create self-optimizing data infrastructures that continuously enhance how information is represented and utilized by AI.Our engineering team collaborates closely with the Granica Research group led by Prof. Andrea Montanari from Stanford University, merging advancements in information theory and learning efficiency with large-scale distributed systems. We believe that the next major breakthrough in AI will stem from innovations in efficient systems, rather than simply larger models.What You Will CreateGlobal Metadata Substrate. Design and refine the global metadata and transactional substrate that enables atomic consistency and schema evolution across exabyte-scale data systems.Adaptive Engines. Architect systems that self-optimize, reorganizing and compressing data according to access patterns, achieving unprecedented efficiency improvements.Intelligent Data Layouts. Innovate new encoding and layout strategies that challenge the theoretical limits of signal per byte read.Autonomous Compute Pipelines. Spearhead the development of distributed compute platforms that scale predictively and maintain reliability even under extreme load and failure conditions.Research to Production. Partner with Granica Research to transform advances in compression and probabilistic modeling into production-ready, industry-leading systems.Latency as Intelligence. Propel systems forward by optimizing for latency as a key aspect of intelligence.
Join Crusoe as a Senior Systems Performance Engineer, where you will play a crucial role in optimizing and enhancing our systems for superior performance. You will be responsible for diagnosing performance bottlenecks, implementing solutions, and ensuring that our infrastructure can scale efficiently. Work in a dynamic environment that encourages innovation and professional growth.
Full-time|$218.4K/yr - $273K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
Join Scale AI's ML platform team (RLXF) as a Machine Learning Research Engineer, where you will play a pivotal role in developing our advanced distributed framework for training and inference of large language models. This platform is vital for enabling machine learning engineers, researchers, data scientists, and operators to conduct rapid and automated training, as well as evaluation of LLMs and data quality.At Scale, we occupy a unique position in the AI landscape, serving as an essential provider of training and evaluation data along with comprehensive solutions for the entire ML lifecycle. You will collaborate closely with Scale's ML teams and researchers to enhance the foundational platform that underpins our ML research and development initiatives. Your contributions will be crucial in optimizing the platform to support the next generation of LLM training, inference, and data curation.If you are passionate about driving the future of AI through groundbreaking innovations, we want to hear from you!
Join Decagon as a Staff Software Engineer specializing in Machine Learning Infrastructure. In this role, you will play a crucial part in enhancing and optimizing our machine learning systems. You will collaborate with a talented team of engineers to build scalable and efficient infrastructure that supports our AI-driven initiatives.As a key contributor, you will leverage your expertise in software engineering and machine learning to solve complex challenges and drive innovation. Your work will impact various projects and help shape the future of our technology.
Full-time|$227.2K/yr - $417K/yr|Hybrid|San Francisco, CA; Los Angeles, CA; New York, NY (Hybrid); USA - Remote
About the Role:Join our dynamic ML Infrastructure team as a Software Engineer, where you'll collaborate intimately with the Machine Learning and Product teams to construct top-tier machine learning inference platforms. These cutting-edge platforms drive vital services such as personalized recommendations, search functionalities, and content comprehension at Tubi.Your primary focus will be on the development and maintenance of low-latency ML model serving systems that cater to Deep Learning, LLM, and Search models. This will include the creation of self-service infrastructure and critical components such as the inference engine, feature store, vector store, and experimentation engine.In this role, you'll enhance our service deployment and operational processes, with opportunities to contribute to open-source projects. Enjoy architectural freedom to explore innovative frameworks, spearhead significant cross-functional projects, and elevate the capabilities of our ML and Product teams.We are currently hiring for two positions:Staff Software EngineerPrincipal Software EngineerAdditional Details: As a Principal Engineer, you will serve as a technical leader and visionary, guiding the advancement of our machine learning platform. You'll address complex technical challenges, shape architectural decisions, and mentor senior engineers, fostering a culture of excellence and continuous improvement. Your contributions will impact millions of users.
Full-time|$192K/yr - $260K/yr|On-site|San Francisco, California
P-186 At Databricks, we are passionate about empowering data teams to tackle some of the world’s most challenging problems, from security threat detection to cancer drug development. Our mission is to build and operate the leading data and AI infrastructure platform, enabling our customers to concentrate on the high-value challenges that are integral to their own objectives. Founded in 2013 by the original creators of Apache Spark™, Databricks has rapidly evolved from a small office in Berkeley, California, to a global powerhouse with over 1000 employees. Trusted by thousands of organizations, from startups to Fortune 100 companies, we are recognized as one of the fastest-growing SaaS companies worldwide. Our engineering teams create highly sophisticated products that address significant needs in the industry. We continuously push the limits of data and AI technology while maintaining the resilience, security, and scalability essential for our customers' success on our platform. We manage one of the largest-scale software platforms, consisting of millions of virtual machines that generate terabytes of logs and process exabytes of data daily. At this scale, we frequently encounter cloud hardware, network, and operating system faults, and our software must effectively shield our customers from these challenges. Modern data analysis leverages advanced techniques, such as machine learning, that far exceed the capabilities of traditional SQL query engines. As a Software Engineer on the Runtime team at Databricks, you will be instrumental in developing the next generation of distributed data storage and processing systems that outshine specialized SQL query engines in relational query performance, while providing the flexibility and programming abstractions to support a variety of workloads, from ETL to data science. Examples of projects you may work on include: Apache Spark™: Contributing to the de facto open-source framework for big data. Data Plane Storage: Developing reliable, high-performance services and client libraries for storing and accessing vast amounts of data on cloud storage backends like AWS S3 and Azure Blob Store. Delta Lake: A storage management system that merges the scalability and cost-effectiveness of data lakes with the performance and reliability of data warehouses, featuring low latency streaming. Its higher-level abstractions and guarantees, including ACID transactions and time travel, significantly reduce the complexity of real-world data engineering architectures. Delta Pipelines: Aiming to simplify the management of data engineering pipelines.
Join the team at Mirendil as a Member of Technical Staff specializing in Machine Learning Systems. In this role, you will leverage your expertise to develop innovative solutions that enhance our ML frameworks and contribute to groundbreaking projects in the AI space. Collaborate with top talent in a dynamic environment that promotes creativity and technical excellence.
Full-time|On-site|San Francisco, Seattle, New York, Toronto
Join Stripe as a Staff Software Engineer in our Stream Compute team, where you will play a pivotal role in building scalable solutions that power the financial infrastructure of the internet. As a member of our innovative engineering team, you will leverage your expertise to design and implement robust software solutions that enhance the performance and reliability of our streaming data capabilities.
About Us:At Parafin, our mission is to empower small businesses to thrive in today's competitive landscape. We understand that small businesses form the backbone of our economy, yet they often face challenges in accessing essential financial resources. Our innovative technology streamlines access to vital financial tools directly on the platforms they already utilize for sales. Partnering with industry leaders such as DoorDash, Amazon, Worldpay, and Mindbody, we provide small businesses with fast, flexible funding, efficient spend management, and effective savings solutions through simple integrations. Parafin manages the complexities of capital markets, underwriting, servicing, compliance, and customer support to ensure seamless experiences for our partners and their small business clients.We are composed of a dynamic team of innovators with backgrounds from top firms like Stripe, Square, Plaid, Coinbase, Robinhood, and CERN, all driven by a passion for developing tools that facilitate small business success. Backed by esteemed venture capitalists including GIC, Notable Capital, Redpoint Ventures, Ribbit Capital, and Thrive Capital, Parafin stands as a Series C company with over $194M raised in equity and $340M in debt facilities. Join us in shaping a future where every small business has access to the financial tools they need.About The PositionWe are on the lookout for a skilled Software Engineer to join our Infrastructure team and spearhead the advancement of our Machine Learning (ML) Platform. This pivotal role is essential for constructing reliable, scalable, and developer-centric systems for model experimentation, training, evaluation, inference, and retraining that drive underwriting and other ML-powered products for small businesses.As a Software Engineer, you will design, build, and maintain the core frameworks and platforms that empower data scientists to deploy high-quality models into production efficiently and safely. You'll work closely with Data Science and Platform Engineering, taking ownership of the ML platform from end-to-end, and develop both batch and real-time underwriting infrastructure.What You'll DoTransform notebooks into reliable software. Break down data scientist training and inference notebooks into reusable, well-tested components (libraries, pipelines, templates) with clear interfaces and documentation.Develop user-friendly ML abstractions. Create SDKs, CLIs, and templates that simplify the definition of features, model training and evaluation, and deployment to batch or real-time targets with minimal boilerplate.Construct our real-time ML inference platform. Establish and scale low-latency model serving capabilities.Enhance batch ML inference processes. Optimize scheduling, parallelism, cost controls, and observability to improve efficiencies.
The Role Are you a seasoned software engineer with a passion for technical leadership and a track record of enhancing production systems? If you're eager to leverage your expertise at the forefront of AI technology, this opportunity might be perfect for you. As a Staff Software Engineer on our pioneering Natural Language Understanding team within the “agent lab,” you will be instrumental in our mission to expand the boundaries of what AI agents can accomplish reliably and at scale. You'll lead the transformation of the Moveworks AI Assistant platform in key areas such as agent orchestration, sandboxed file systems, code execution, latency optimization, agent memory management, LLM self-reflection and improvement, execution environment simulation, enterprise knowledge graphs, and multimodal I/O. Equipped with cutting-edge enterprise AI tools, including top-tier LLMs from providers like OpenAI, our team focuses on rapid development of scalable infrastructure, tackling complex engineering challenges, and maximizing value for our customers. If you're ready to elevate your career alongside a passionate and impact-driven team, we would be thrilled to engage with you.
Join our dynamic team at Cloudflare as a Senior/Principal Systems Engineer specializing in Workers AI (AI/ML). In this pivotal role, you will leverage your expertise in artificial intelligence and machine learning to develop cutting-edge solutions that enhance our platform's capabilities. You will collaborate with cross-functional teams to drive innovation and improve our systems, ensuring we remain at the forefront of technology.
Full-time|$180K/yr - $225K/yr|On-site|United States
Airbnb connects millions of hosts and guests, offering stays and experiences in locations around the world. Since 2007, the company has helped over 2 billion guest arrivals discover new places and cultures. Role Overview The BizTech team at Airbnb is hiring a Staff Systems Engineer focused on Oracle EPM Planning systems. This role plays a key part in supporting the Finance team, acting as the technical architect and primary developer for the EPM environment. Main Responsibilities Design and develop scalable Oracle EPM solutions for Finance Budgeting and Forecasting. Build and refine financial forecasting models. Drive process improvements using AI technologies. Integrate EPM with data warehouse systems. Translate complex business requirements into technical solutions within Oracle EPM.
OverviewPluralis Research is at the forefront of innovation in Protocol Learning, specializing in the collaborative training of foundational models. Our approach ensures that no single participant ever has or can obtain a complete version of the model. This initiative aims to create community-driven, collectively owned frontier models that operate on self-sustaining economic principles.We are seeking experienced Senior or Staff Machine Learning Engineers with over 5 years of expertise in distributed systems and large-scale machine learning training. In this role, you will design and implement a groundbreaking substrate for training distributed ML models that function effectively over consumer-grade internet connections.
About LightfieldLightfield is an innovative, AI-driven Customer Relationship Management (CRM) platform that seamlessly integrates your email, calendar, and meeting interactions. By capturing every communication, it transforms these interactions into organized insights, including accounts, tasks, follow-ups, and actionable intelligence, ensuring nothing falls through the cracks.We are revolutionizing CRM from the ground up. Rather than imposing rigid structures on teams, Lightfield learns from real-world workflows, adapting and automating processes while surfacing crucial insights that propel growth. Our goal is to build the CRM platform we've always dreamed of: fast, intelligent, and genuinely supportive.Supported by renowned investors such as Greylock, Lightspeed, and Coatue, our team includes experts who have previously developed Tome, a generative AI presentation tool utilized by over 25 million users. Our background also includes significant contributions to projects at Llama, Instagram, Facebook Messenger, Pinterest, Google, and Salesforce.About the RoleWe are in search of a seasoned Senior/Staff Software Engineer (5+ years of experience) who is product-oriented and excited to confront the challenges of developing highly scalable, high-performance, and robust features and systems that deliver cutting-edge AI product experiences. At Lightfield, engineers assume full ownership of projects from inception to impact, collaborating across various functions and pushing the boundaries of what's achievable in applied AI systems development. You will be working with a talented team to deliver industry-leading solutions and play a pivotal role in shaping the product and technical direction of Lightfield.What You Will DoPartner with product leaders to outline strategic initiatives, pinpoint key customer challenges, and translate requirements into technical execution.Design, build, and maintain intricate full-stack product features and systems, ensuring reliability, scalability, and high performance.Create observability and metrics to facilitate smooth operations, proactive issue identification, and ongoing enhancements.Lead technical design discussions, mentor colleagues at various levels, conduct comprehensive code reviews, and help establish engineering best practices and standards.Contribute to the development of a top-tier engineering team through recruitment, mentorship, and knowledge transfer.Who You Are5+ years of software development expertise, with a solid foundation in both front-end and back-end technologies.Proven experience in constructing large-scale, high-performance, and maintainable systems and products, utilizing modern programming languages and frameworks.
On-site|On-site|San Francisco, CA | New York City, NY | Seattle, WA
About AnthropicAt Anthropic, we are on a mission to develop AI systems that are reliable, interpretable, and steerable, ensuring they are safe and beneficial for users and society. Our dynamic team consists of dedicated researchers, engineers, policy experts, and business leaders who collaborate to advance the field of beneficial AI.About the Role:As the Engineering Manager in our performance and scaling teams, you will lead efforts to optimize our computing resources for both inference and training. Your role will involve identifying and eliminating bottlenecks, creating robust solutions, and enhancing system efficiency. In this fast-paced environment, you will provide clarity, focus, and context to your team, driving impactful results.
Feb 10, 2026
Sign in to browse more jobs
Create account — see all 6,035 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.