Machine Learning Engineer Decentralized Ml Training Platform jobs in San Francisco – Browse 5,669 openings on RoboApply Jobs

Machine Learning Engineer Decentralized Ml Training Platform jobs in San Francisco

Open roles matching “Machine Learning Engineer Decentralized Ml Training Platform” with location signals for San Francisco. 5,669 active listings on RoboApply Jobs.

5,669 jobs found

1 - 20 of 5,669 Jobs
Apply
company
Full-time|On-site|San Francisco

OverviewPluralis Research is at the forefront of Protocol Learning, innovating a decentralized approach to train and deploy AI models that democratizes access beyond just well-funded corporations. By aggregating computational resources from diverse participants, we incentivize collaboration while safeguarding against centralized control of model weights, paving the way for a truly open and cooperative environment for advanced AI.We are seeking a talented Machine Learning Training Platform Engineer to design, develop, and scale the core infrastructure that powers our decentralized ML training platform. In this role, you will have ownership over essential systems including infrastructure orchestration, distributed computing, and service integration, facilitating ongoing experimentation and large-scale model training.ResponsibilitiesMulti-Cloud Infrastructure: Create resource management systems that provision and orchestrate computing resources across AWS, GCP, and Azure using infrastructure-as-code tools like Pulumi or Terraform. Manage dynamic scaling, state synchronization, and concurrent operations across hundreds of diverse nodes.Distributed Training Systems: Design fault-tolerant infrastructure for distributed machine learning, including GPU clusters, NVIDIA runtime, S3 checkpointing, large dataset management and streaming, health monitoring, and resilient retry strategies.Real-World Networking: Develop systems that simulate and manage real-world network conditions—such as bandwidth shaping, latency injection, and packet loss—while accommodating dynamic node churn and ensuring efficient data flow across workers with varying connectivity, as our training occurs on consumer nodes and non-co-located infrastructure.

Apr 1, 2026
Apply
companyWhatnot logo
FullTime|On-site|San Francisco, CA

Be a Part of the Revolution in E-Commerce with Whatnot!Whatnot stands as the leading live shopping platform across North America and Europe, where you can buy, sell, and explore the items you cherish. We are transforming the landscape of e-commerce by merging community engagement, shopping, and entertainment into a unique experience tailored just for you. As a remote-first team, we are driven by innovation and firmly rooted in our core values. With operational hubs in the US, UK, Germany, Ireland, and Poland, we are collaboratively crafting the future of online marketplaces.From fashion and beauty to electronics and collectibles like trading cards, comic books, and live plants, our live auctions cater to a diverse audience.And this is just the beginning! As one of the fastest-growing marketplaces, we are on the lookout for innovative, forward-thinking problem solvers in all areas of our business. Stay updated with the latest from Whatnot through our news and engineering blogs, and join us in empowering individuals to transform their passions into successful ventures while fostering community through commerce. The RoleWe are seeking passionate builders—intellectually curious, entrepreneurial engineers who are ready to pioneer the future of AI and ML at Whatnot. You will be responsible for designing and scaling the foundational infrastructure that supports machine learning and self-hosted large language model applications throughout the organization. Collaborating closely with machine learning scientists, you will facilitate the deployment of cutting-edge models into production, creating entirely new product experiences. Your work will involve constructing systems that ensure advanced machine learning is reliable and efficient at scale—from low-latency model serving to distributed training and high-throughput GPU inference.Your Responsibilities:Lead the infrastructure that powers AI and ML models across vital business domains—enhancing growth, trust and safety, fraud detection, seller tools, and more.Prototype, deploy, and operationalize innovative ML architectures that significantly influence user experience and marketplace dynamics.Design and scale inference infrastructure capable of managing large models with minimal latency and maximal throughput.Construct distributed training and inference pipelines utilizing GPUs, as well as model and data parallelism.Push the boundaries of your expertise and explore new technologies and methodologies.

Feb 5, 2026
Apply
companyFaire logo
Full-time|$268K/yr - $368.5K/yr|On-site|San Francisco, CA

About FaireFaire is a transformative online wholesale marketplace, driven by the conviction that local businesses are the future. Independent retailers around the globe generate more revenue than massive corporations like Walmart and Amazon combined, yet individually, they remain small. At Faire, we harness technology, data, and machine learning to connect this vibrant community of entrepreneurs. Think of your favorite local boutique — we empower them to discover and sell the best products from around the world. With our innovative tools and insights, we aim to level the playing field, enabling small businesses to thrive against larger competitors.By championing the growth of independent businesses, Faire positively impacts local economies on a global scale. We’re in search of intelligent, resourceful, and passionate individuals to join us in fueling the shop local movement. If you value community, we invite you to be part of ours.About this RoleAs the Senior Staff Machine Learning Platform Engineer, you will spearhead the technical vision and evolution of Faire's ML platform. You will establish standards, influence organization-wide architecture, and lead intricate, cross-functional initiatives that enhance data science velocity at scale. This position is crucial for adapting ML workflows to leverage modern AI productivity tools. You will not only develop models but also design the systems that enable those models to empower tens of thousands of small retailers in competing and growing their local businesses.

Mar 4, 2026
Apply
company
Full-time|On-site|San Francisco

OverviewPluralis Research is at the forefront of innovation in Protocol Learning, specializing in the collaborative training of foundational models. Our approach ensures that no single participant ever has or can obtain a complete version of the model. This initiative aims to create community-driven, collectively owned frontier models that operate on self-sustaining economic principles.We are seeking experienced Senior or Staff Machine Learning Engineers with over 5 years of expertise in distributed systems and large-scale machine learning training. In this role, you will design and implement a groundbreaking substrate for training distributed ML models that function effectively over consumer-grade internet connections.

Apr 1, 2026
Apply
companySciforium logo
Full-time|On-site|San Francisco

At Sciforium, we are at the forefront of AI infrastructure, dedicated to the development of advanced multimodal AI models and an innovative serving platform that emphasizes high efficiency. With substantial funding and direct collaboration from AMD, our team is rapidly expanding to create the complete stack for pioneering AI models and dynamic real-time applications.Role OverviewThis position provides a distinct opportunity to engage with the fundamental systems that drive Sciforium's multimodal AI models. You will play a crucial role in constructing the model serving platform, working with C++, Python, runtime execution, and distributed infrastructure to design a swift, dependable engine for real-time AI applications.You will acquire practical experience in performance engineering, discover how large AI models are optimized and deployed at scale, and collaborate closely with ML researchers and seasoned systems engineers. If you thrive in low-level programming and are passionate about performance, this role offers both impactful contributions and significant growth opportunities.

Nov 15, 2025
Apply
companyScale AI, Inc. logo
Full-time|$218.4K/yr - $273K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY

Join Scale AI's ML platform team (RLXF) as a Machine Learning Research Engineer, where you will play a pivotal role in developing our advanced distributed framework for training and inference of large language models. This platform is vital for enabling machine learning engineers, researchers, data scientists, and operators to conduct rapid and automated training, as well as evaluation of LLMs and data quality.At Scale, we occupy a unique position in the AI landscape, serving as an essential provider of training and evaluation data along with comprehensive solutions for the entire ML lifecycle. You will collaborate closely with Scale's ML teams and researchers to enhance the foundational platform that underpins our ML research and development initiatives. Your contributions will be crucial in optimizing the platform to support the next generation of LLM training, inference, and data curation.If you are passionate about driving the future of AI through groundbreaking innovations, we want to hear from you!

Mar 26, 2026
Apply
companyFoxglove logo
Full-time|On-site|San Francisco, CA

Join us in creating the backbone of data infrastructure for real-world robotic operations.As robotics transitions from research labs to real-world applications across factories, warehouses, vehicles, and field deployments, understanding the intricacies of robotic performance becomes critical. When robots encounter failures or unexpected behaviors, data analysis is key to deciphering the underlying issues.At Foxglove, we are at the forefront of building tools for observability, visualization, and data infrastructure that empower robotics and autonomous systems teams to manage, analyze, and derive insights from vast amounts of multimodal sensor data collected from operational systems and production fleets.Role OverviewWe are seeking a passionate ML Platform Engineer with robust infrastructure expertise to design, deploy, and scale our data platform systems. This platform-centric role will allow you to take charge of the infrastructure layer that facilitates machine learning in production environments, going beyond just the models themselves.Your responsibilities will encompass ensuring the reliability, scalability, and performance of the ML platform, including areas such as inference serving, pipeline orchestration, training infrastructure, and evaluation frameworks. You will be tackling substantial challenges such as managing petabyte-scale multimodal robotics data and optimizing high-throughput retrieval and embedding pipelines in a hands-on infrastructure capacity.Key ResponsibilitiesDesign and operationalize production inference infrastructure, focusing on model serving, autoscaling, load balancing, and cost efficiency across cloud environments.Own the platform architecture for embedding and retrieval pipelines that enable semantic search across multimodal robotics data (image, video, point cloud, and time series).Develop and sustain the training and evaluation infrastructure that supports rapid model performance iteration, including job orchestration, experiment tracking, and dataset versioning.Lead decisions on cloud infrastructure (AWS/GCP) that affect latency, throughput, reliability, and scalability.Establish platform abstractions and internal tools that empower product engineers to deliver ML-enhanced features without managing infrastructure directly.Assess, integrate, and operationalize third-party ML infrastructure components while establishing clear build vs. buy frameworks for the team.

Apr 2, 2026
Apply
companytvScientific powered by Pinterest logo
Machine Learning Platform Engineer

tvScientific powered by Pinterest

Full-time|$123.7K/yr - $254.7K/yr|Remote|San Francisco, CA, US; Remote, US

tvScientific, powered by Pinterest, develops a connected TV (CTV) advertising platform designed for performance marketers. The platform combines media buying, optimization, measurement, and attribution to automate and improve TV advertising. Built by professionals in programmatic advertising, digital media, and ad verification, tvScientific aims to deliver measurable results for advertisers. Role overview As a Machine Learning Platform Engineer, you will join a team that operates where Site Reliability Engineering meets low-latency distributed systems. This team advances Pinterest’s real-time machine learning and measurement infrastructure, focusing on sub-millisecond decision-making and high-throughput data access. Seamless integration with Pinterest’s core stack is central to the work. What you will do Design and build systems to keep queries and RPCs fast and reliable, even during periods of heavy demand. Develop and enhance the foundation of the machine learning training and serving stack. Address challenges in storage, indexing, streaming, fan-out, and managing backpressure and failures across services and regions. Collaborate with software engineering, data infrastructure, and SRE teams to ensure systems are observable, debuggable, and ready for production. Key areas of focus I/O scheduling and batching Lock-free or low-contention data structures Connection pooling and query planning Kernel and network tuning On-disk layout and indexing strategies Circuit-breaking and autoscaling Incident response and failure management NixOS Defining and maintaining SLIs and SLOs This position is a strong fit for engineers interested in building and operating large-scale infrastructure, particularly those who enjoy working on real-time systems, observability, and reliability.

Apr 23, 2026
Apply
companyParafin logo
Full-time|On-site|San Francisco, CA

About Us:At Parafin, our mission is to empower small businesses to thrive in today's competitive landscape. We understand that small businesses form the backbone of our economy, yet they often face challenges in accessing essential financial resources. Our innovative technology streamlines access to vital financial tools directly on the platforms they already utilize for sales. Partnering with industry leaders such as DoorDash, Amazon, Worldpay, and Mindbody, we provide small businesses with fast, flexible funding, efficient spend management, and effective savings solutions through simple integrations. Parafin manages the complexities of capital markets, underwriting, servicing, compliance, and customer support to ensure seamless experiences for our partners and their small business clients.We are composed of a dynamic team of innovators with backgrounds from top firms like Stripe, Square, Plaid, Coinbase, Robinhood, and CERN, all driven by a passion for developing tools that facilitate small business success. Backed by esteemed venture capitalists including GIC, Notable Capital, Redpoint Ventures, Ribbit Capital, and Thrive Capital, Parafin stands as a Series C company with over $194M raised in equity and $340M in debt facilities. Join us in shaping a future where every small business has access to the financial tools they need.About The PositionWe are on the lookout for a skilled Software Engineer to join our Infrastructure team and spearhead the advancement of our Machine Learning (ML) Platform. This pivotal role is essential for constructing reliable, scalable, and developer-centric systems for model experimentation, training, evaluation, inference, and retraining that drive underwriting and other ML-powered products for small businesses.As a Software Engineer, you will design, build, and maintain the core frameworks and platforms that empower data scientists to deploy high-quality models into production efficiently and safely. You'll work closely with Data Science and Platform Engineering, taking ownership of the ML platform from end-to-end, and develop both batch and real-time underwriting infrastructure.What You'll DoTransform notebooks into reliable software. Break down data scientist training and inference notebooks into reusable, well-tested components (libraries, pipelines, templates) with clear interfaces and documentation.Develop user-friendly ML abstractions. Create SDKs, CLIs, and templates that simplify the definition of features, model training and evaluation, and deployment to batch or real-time targets with minimal boilerplate.Construct our real-time ML inference platform. Establish and scale low-latency model serving capabilities.Enhance batch ML inference processes. Optimize scheduling, parallelism, cost controls, and observability to improve efficiencies.

Jan 5, 2026
Apply
companyScribd Inc. logo
Full-time|$126K/yr - $196K/yr|Hybrid|San Francisco

About Scribd:At Scribd Inc. (pronounced 'scribbed'), we're on a mission to ignite human curiosity. Join our innovative team as we craft a diverse world of stories and knowledge, democratizing the exchange of ideas and empowering collective intelligence through our four flagship products: Everand, Scribd, Slideshare, and Fable.This job posting is for an exciting, open position within our organization.We foster a culture where authenticity and boldness thrive, facilitating open debates and commitments as we embrace the unexpected. Every team member is empowered to take initiative, prioritizing the needs of our customers.In terms of workplace structure, we prioritize a balance between personal flexibility and communal connections. Our Scribd Flex initiative allows employees, in collaboration with their managers, to determine their daily work styles that best suit their individual needs while promoting intentional in-person interactions to enhance collaboration and company culture. Therefore, occasional in-person attendance is mandatory for all employees, regardless of their location.What do we seek in our new team members? We value 'GRIT'—the intersection of passion and perseverance toward long-term goals. At Scribd Inc., we believe in harnessing the potential that GRIT unlocks and encourage each employee to adopt a GRIT-driven approach to their work. This means we are looking for individuals who can set and achieve Goals, deliver Results in their responsibilities, contribute Innovative ideas, and positively impact the broader Team through collaboration and a positive attitude.About Our Machine Learning Team:Our Machine Learning team is pivotal in developing the platform and product applications that drive personalized discovery, recommendations, and generative AI functionalities across Scribd, Slideshare, and Everand. The ML team operates on the Orion ML Platform, providing essential ML infrastructure such as a feature store, model registry, model inference systems, and embedding-based retrieval (EBR). Our Machine Learning Engineers collaborate closely with the Product team to integrate machine learning into user-facing features, including real-time personalization and AskAI LLM-powered experiences.

Aug 19, 2025
Apply
companyOpenAI logo
Full-time|Hybrid|San Francisco

About Our TeamThe Training Runtime team is at the forefront of developing a sophisticated distributed machine-learning training runtime that supports everything from initial research prototypes to cutting-edge model deployments. Our mission is twofold: to enhance the capabilities of researchers and to facilitate large-scale model training. We are creating a cohesive and flexible runtime environment that evolves with researchers as they scale their projects.Our initiatives revolve around three key pillars: optimizing high-performance, asynchronous, zero-copy tensor and optimizer-state-aware data movement; constructing resilient, fault-tolerant training frameworks (including robust training loops, effective state management, resilient checkpointing, and comprehensive observability); and managing distributed processes for long-duration, job-specific uses. By embedding established large-scale functionalities into a user-friendly runtime, we empower teams to iterate rapidly and operate reliably at any scale, working closely with model-stack, research, and platform teams. Our success is measured in terms of both training throughput (the speed at which models are trained) and researcher efficiency (the speed at which concepts transform into experiments and products).About the PositionAs a Machine Learning Framework Engineer on our Training team, you will be pivotal in enhancing the training throughput of our internal framework while empowering researchers to explore innovative ideas. This role demands exceptional engineering skills, including the design, implementation, and optimization of state-of-the-art AI models, as well as writing clean, efficient machine learning code—a task that is often more challenging than it seems. A deep understanding of supercomputer performance metrics will also be critical. Ultimately, every project you undertake will aim to advance the field of machine learning.We seek individuals who are passionate about performance optimization, have a solid grasp of distributed systems, and have an aversion to bugs in their code. Given that our training framework is utilized for extensive runs involving numerous GPUs, any performance enhancements will significantly impact our operations.This position is based in San Francisco, CA, and adheres to a hybrid work model requiring three days in the office each week. We also provide relocation assistance for new hires.Key Responsibilities:Implement advanced techniques within our internal training framework to maximize hardware efficiency during training sessions.Conduct profiling and optimization of our training framework to enhance performance.Collaborate with researchers to facilitate the development of next-generation machine learning models.You Will Excel in This Role If You:Possess a strong passion for optimizing system performance.Have a profound understanding of distributed systems and their complexities.Demonstrate meticulous attention to detail, especially in code quality and debugging.

Oct 29, 2025
Apply
companyTubi Inc. logo
Full-time|$292K/yr - $417.2K/yr|Hybrid|San Francisco, CA; Los Angeles, CA; New York, NY (Hybrid); USA - Remote

About the Role:The Machine Learning team at Tubi is at the forefront of transforming user experiences through cutting-edge technology. With the industry's largest inventory and a vast audience of millions, we are dedicated to solving complex challenges in recommendations, search, content understanding, and ad optimization, shaping the future of streaming.We are on the lookout for a Director of Machine Learning Engineering and Infrastructure to spearhead a hybrid team that merges advanced ML engineering with exceptional infrastructure design. In this pivotal role, you will define the strategic vision and implementation for scaling our machine learning capabilities, ensuring our distributed systems and infrastructure can foster innovation on a grand scale. You will blend technical expertise with outstanding leadership to guide teams in delivering robust ML systems and high-performance distributed services.

Mar 17, 2026
Apply
companyPhysical Intelligence logo
Full-time|On-site|San Francisco

As a Machine Learning Infrastructure Engineer at Physical Intelligence, you will play a vital role in enhancing and optimizing our training systems and core model code. You will take ownership of critical infrastructure for large-scale training, which includes managing GPU/TPU compute, orchestrating jobs, and developing reusable and efficient JAX training pipelines. Collaborating closely with researchers and model engineers, you will help transform innovative ideas into experiments and subsequently into production training runs.This position is hands-on and offers significant leverage at the intersection of machine learning, software engineering, and scalable infrastructure.The TeamOur ML Infrastructure team is dedicated to supporting and accelerating Physical Intelligence's core modeling initiatives by building systems that ensure large-scale training is reliable, reproducible, and efficient. The team collaborates with research, data, and platform engineers to guarantee that models can seamlessly transition from prototype to production-grade training runs.Key Responsibilities- Manage training/inference infrastructure: Design, implement, and maintain systems for large-scale model training, which includes scheduling, job management, checkpointing, and performance metrics/logging.- Expand distributed training: Collaborate with researchers to efficiently scale JAX-based training across TPU and GPU clusters.- Enhance performance: Profile and optimize memory usage, device utilization, throughput, and distributed synchronization to maximize efficiency.- Facilitate rapid iteration: Develop abstractions for launching, monitoring, debugging, and reproducing experiments.- Oversee compute resources: Ensure optimal allocation and utilization of cloud-based GPU/TPU compute resources while managing costs effectively.- Collaborate with researchers: Translate research requirements into infrastructure capabilities and promote best practices for large-scale training.- Contribute to core training code: Evolve the JAX model and training code to accommodate new architectures, modalities, and evaluation metrics.

Aug 24, 2024
Apply
companyFoxglove logo
Full-time|On-site|San Francisco, CA

Join us at Foxglove, where we are revolutionizing the robotics industry by building robust data infrastructure for real-world applications.As robotics transitions from research environments to practical implementations in factories, warehouses, vehicles, and field operations, data becomes essential for engineers to troubleshoot failures, understand unexpected behaviors, and enhance robotic systems.At Foxglove, we provide the observability, visualization, and data infrastructure that enable robotics and autonomous systems teams to efficiently ingest, store, query, replay, and analyze extensive volumes of multimodal sensor data from live systems and production fleets.About the RoleWe are seeking a talented Applied Machine Learning Engineer with strong infrastructure insights to design, deploy, and scale the machine learning systems that power our data platform. In this impactful role, you will be responsible for optimizing production ML infrastructure—from enhancing inference pipeline throughput to establishing training and evaluation workflows. You will focus on high-priority challenges, such as developing retrieval applications for petabyte-scale multimodal robotics data, utilizing cutting-edge models to create high-performance search and data mining products, and fostering an internal ML flywheel for rapid iteration. This is a hands-on, application-driven position rather than a research-focused role.Key ResponsibilitiesDeploy and manage inference infrastructure for production ML workloads, focusing on model serving, scalability, and cost efficiency.Build and oversee vector database integrations and embedding applications to facilitate semantic search across various multimodal robotics data types (image, video, point cloud, and time series).Design and implement evaluation and training infrastructure to enhance model performance rapidly.Lead cloud architecture decisions and tools to optimize inference latency, throughput, cost, and reliability at scale.Collaborate closely with product engineers to deliver application-driven ML features that empower developers at the forefront of robotics and physical AI, steering clear of prototype experiments.Identify appropriate off-the-shelf solutions for production and determine when to build versus buy.

Apr 6, 2026
Apply
companyAmbience Healthcare logo
Full-time|$250K/yr - $250K/yr|Hybrid|San Francisco

About Us:At Ambience Healthcare, we are not just another scribe; we are pioneering an AI intelligence platform that reinvigorates the human touch in healthcare while delivering significant ROI for health systems nationwide.Our innovative technology enables healthcare providers to concentrate on delivering exceptional care by alleviating the administrative burdens that detract from patient interactions and their most impactful work. Ambience provides real-time, coding-aware documentation and clinical workflow support in ambulatory, emergency, and inpatient settings across leading health systems in North America.Our team is driven by a relentless pursuit of excellence and extreme ownership, dedicated to crafting the best solutions for our health system partners. We champion transparency, positivity, and thoughtful engagement, holding each other accountable because we understand the significance of the challenges we tackle.Ambience has earned accolades such as being ranked #1 for Improving the Clinician Experience in the KLAS Research Emerging Solutions Top 20 Report, being recognized by Fast Company as one of the Next Big Things in Tech, and being named one of the best AI companies in healthcare by Inc. We were also selected as a LinkedIn Top Startup in 2024 and 2025. Our esteemed investors include Oak HC/FT, Andreessen Horowitz (a16z), OpenAI Startup Fund, and Kleiner Perkins — and our journey is just beginning.The Role:As a Staff Machine Learning Engineer, you will play a crucial role in advancing clinical AI that impacts millions of patient encounters across the largest health systems in the nation. Your contributions will directly influence the speed at which we enhance our AI capabilities through the platform you will oversee.You will design and implement evaluation and release processes that empower teams to deliver with confidence, create observability tools to identify quality issues pro-actively, and develop debugging tools that facilitate rapid issue reproduction. Additionally, you’ll work on the chart context retrieval layer that transforms patient history into model-ready inputs.Our goal is to enable teams to iterate on quality within days, not weeks, ensuring that every enhancement you implement adds value across all product teams each quarter.Please note that our engineering roles operate in a hybrid model from our San Francisco office (3 days per week).What You’ll Own:Evaluation & Release Infrastructure — Developing automated grading systems and release gates that function seamlessly across product teams, creating a unified evaluation dataset with version control to replace fragmented workflows. Implementing production-quality monitoring that includes end-to-end tracing, shared metrics, and automated alerts.Debugging Tools — Building encounter replay features that reconstruct precise inference inputs (including retrieved chart context, packed prompts, and model versions) to allow teams to troubleshoot issues without sifting through logs. Creating differential views to compare known good states with regressions.

Feb 2, 2026
Apply
companyWhatnot logo
Full-Time|On-site|San Francisco, CA

Embrace the Future of Commerce with Whatnot!Whatnot stands as North America and Europe’s premier live shopping platform, dedicated to transforming the way you buy, sell, and discover your favorite items. We are on a mission to redefine e-commerce by seamlessly merging community engagement, shopping, and entertainment into a unique experience tailored just for you. As part of a remote, co-located team, we thrive on innovation while being firmly rooted in our core values. With operational hubs across the US, UK, Germany, Ireland, and Poland, we are collaboratively shaping the future of online marketplaces.Our live auctions span a diverse range of categories from fashion and beauty to electronics and collectibles, including trading cards, comic books, and even live plants. There’s truly something for everyone!And this is just the beginning! As one of the fastest-growing marketplaces, we are in search of bold, innovative problem solvers across all functional areas. Stay updated with the latest Whatnot news through our news and engineering blogs, and join us in empowering individuals to transform their passions into thriving businesses, fostering connections through commerce. Your RoleWe are seeking hands-on leaders—intellectually curious and technically proficient individuals ready to influence the future of AI and ML at Whatnot. In this pivotal role, you will spearhead the development and scaling of the foundational infrastructure that supports machine learning and self-hosted large language model applications across our organization. Collaborating closely with machine learning scientists, you will drive the implementation of innovative models powered by near-real-time features, enhancing product experiences. This entails building robust systems that ensure advanced ML is both reliable and efficient at scale—from low-latency deep learning model serving and streaming feature ingestion to distributed training and high-throughput GPU inference. As a managerial role, a strong technical foundation is essential, and potential candidates should be enthusiastic about diving deep into the details. You will elevate architectural discussions, provide insightful technical feedback, and dedicate at least one day a week to coding.Your Responsibilities:Lead the infrastructure supporting AI and ML models across critical business areas, enhancing growth, recommendations, trust and safety, fraud detection, seller tooling, and more.Oversee the prototyping, deployment, and productionization of innovative ML architectures, ensuring they align with our strategic objectives.

Jan 15, 2026
Apply
companyWhatnot logo
Full-time|On-site|San Francisco, CA

Join Whatnot as a Machine Learning Platform Engineer, where you'll play a pivotal role in shaping the future of our AI-driven solutions. In this dynamic position, you will collaborate with cross-functional teams to design, implement, and optimize machine learning platforms that drive efficiency and innovation.Your expertise will be critical in enhancing our data processing capabilities and deploying robust machine learning models at scale. If you are passionate about leveraging cutting-edge technology to solve complex challenges, we want to hear from you!

Mar 3, 2026
Apply
companytvScientific logo
Full-time|Remote|San Francisco, CA, US; Remote, US

tvScientific seeks a Machine Learning Platform Engineer to help shape the company’s advertising technology. This position can be based in San Francisco, CA, or performed remotely from anywhere in the United States. Role overview This role focuses on building and refining machine learning models that drive the core of tvScientific’s advertising platform. The work combines technical skill with creative problem-solving to support the platform’s effectiveness. What you will do Develop and optimize machine learning models to enhance advertising performance Collaborate with team members to deliver solutions that balance innovation, scalability, and reliability Apply technical expertise to address challenges at the intersection of technology and creative thinking Location Candidates may work from San Francisco, CA, or remotely within the US.

Apr 23, 2026
Apply
companymithrl logo
Full-time|On-site|San Francisco

ABOUT MITHRLAt Mithrl, we envision a future where innovative medicines are delivered to patients in mere months rather than years, and scientific advancements occur at unparalleled speed.Mithrl is pioneering the development of the world’s first commercially available AI Co-Scientist, a revolutionary discovery engine that swiftly converts complex biological data into actionable insights. Scientists can pose questions in natural language, and Mithrl will provide comprehensive analyses, propose novel targets, formulate hypotheses, and generate patent-ready reports—all in minutes.Our success is evident:Achieved 12X year-over-year revenue growthRecognized by leading biotech firms and major pharmaceutical companies across three continentsFacilitating real breakthroughs from target identification to improved patient outcomes.ABOUT THE ROLEWe are seeking a talented ML Engineer for our Discovery Applications team to create high-level, end-to-end scientific workflows that enhance decision-making within the Mithrl platform. This position focuses on developing the application layer that interacts with our AI Co-Scientist. You will play a crucial role in how scientists uncover biomarkers, validate targets, design experiments, and initiate early discovery programs that lead to IND-enabling studies.A strong understanding of the discovery and preclinical development cycle is essential for this role. You will need to comprehend how research teams transition from early target hypotheses to creating biomarker strategies, hit identification, lead optimization, and preclinical validation. Your applications will directly support decision-making across this continuum and will be utilized by scientists and program teams.Your responsibilities will include designing multi-step workflows that integrate analysis modules, ML models, domain-specific logic, and agentic reasoning into cohesive applications. These applications will encompass biomarker discovery, target identification, target validation, small molecule hit identification and optimization, as well as gene therapy workflows. Furthermore, you will adapt applications to incorporate new data modalities as our platform evolves.

Dec 30, 2025
Apply
companyScale AI logo
Full-time|$252K/yr - $315K/yr|On-site|San Francisco, CA; New York, NY

Join Scale's innovative Large Language Model (LLM) post-training platform team, where you will contribute to the development of our internal distributed framework designed specifically for LLM training. This sophisticated platform empowers Machine Learning Engineers (MLEs), researchers, data scientists, and operators to perform rapid and automated training and evaluation of LLMs. Additionally, it underpins the training framework for our data quality evaluation pipeline.Scale is at the forefront of the Artificial Intelligence sector, acting as a vital provider of training and evaluation data, as well as comprehensive solutions for the entire machine learning lifecycle. In this role, you will collaborate closely with Scale’s ML teams and researchers to construct the foundational platform that supports all our ML research and development initiatives. Your work will involve building and optimizing this platform to facilitate the training, inference, and data curation of next-generation LLMs.If you are passionate about driving the future of AI through groundbreaking innovations, we invite you to connect with us!

Mar 26, 2026

Sign in to browse more jobs

Create account — see all 5,669 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.