Research Engineer Ai Reinforcement Learning Infrastructure jobs in Sunnyvale – Browse 656 openings on RoboApply Jobs

Research Engineer Ai Reinforcement Learning Infrastructure jobs in Sunnyvale

Open roles matching “Research Engineer Ai Reinforcement Learning Infrastructure” with location signals for Sunnyvale. 656 active listings on RoboApply Jobs.

656 jobs found

1 - 20 of 656 Jobs
Apply
companyApplied Intuition, Inc. logo
Full-time|$126K/yr - $423K/yr|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI technologies. Established in 2017 and currently valued at $15 billion, this Silicon Valley powerhouse is building the essential digital infrastructure to infuse intelligence into every moving machine on Earth. Serving a diverse range of industries including automotive, defense, trucking, construction, mining, and agriculture, Applied Intuition excels in three key domains: tools and infrastructure, operating systems, and autonomy. Eighteen of the world's top 20 automakers, along with the United States military and its partners, rely on our solutions to deliver physical intelligence. Our headquarters is located in Sunnyvale, California, with additional offices in Washington, D.C.; San Diego; Ft. Walton Beach, FL; Ann Arbor, MI; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.As an in-office company, we expect our employees to primarily work from their Applied Intuition office five days a week. However, we value flexibility and trust our employees to manage their schedules responsibly. This may include occasional remote work, starting the day with morning meetings from home, or leaving early for family commitments.Role and Team OverviewWe seek a dedicated Research Engineer (AI/RL Infrastructure) to join our Research Group at Applied Intuition. This position is perfect for engineers who design, build, and maintain large-scale machine learning systems and collaborate closely with researchers to innovate and enhance the foundational platform for next-generation physical AI systems.The Research Group's mission is to develop pioneering technologies that facilitate next-generation physical AI, focusing on two of the most challenging applications that are transforming our daily lives: end-to-end autonomous driving and robotic generalists. Our team comprises leading experts from prestigious institutions and organizations, recognized for their outstanding contributions in both academia and industry, including multiple Best Paper awards at top conferences and journals such as CVPR and ICRA. For further insights, visit appliedintuition.com/research.

Feb 13, 2026
Apply
companyApplied Intuition, Inc. logo
Full-time|$126K/yr - $423K/yr|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI technology. Established in 2017 and now valued at a remarkable $15 billion, our Silicon Valley-based company is dedicated to developing the digital infrastructure necessary to infuse intelligence into every moving machine across the globe. We cater to various sectors, including automotive, defense, trucking, construction, mining, and agriculture, focusing on tools and infrastructure, operating systems, and autonomy. Our solutions are trusted by 18 of the top 20 global automakers and the United States military along with its allies. Our headquarters is in Sunnyvale, California, with additional offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.As an in-office company, we expect our employees to primarily work from their Applied Intuition office five days a week. However, we value flexibility and trust our employees to manage their schedules responsibly, which may include occasional remote work, starting the day with morning meetings from home, or leaving early for family commitments.About the Role and TeamWe are seeking multiple enthusiastic Research Engineers to join our Research Group at Applied Intuition. The group's mission is to pioneer innovative technology that supports the next generation of physical AI, focusing particularly on the transformative applications of end-to-end autonomous driving and robotic generalist solutions. Our team comprises leading experts from top institutions and organizations, recognized for their outstanding academic and industry contributions, including eight Best Paper awards at prestigious conferences and journals such as CVPR and ICRA. To learn more, visit appliedintuition.com/research.With access to industry-leading tools and infrastructure, our researchers can utilize millions of miles of data from extensive fleets and implement their developed methods into various autonomous and robotic systems, including self-driving cars and trucks.

Feb 17, 2026
Apply
companyApplied Intuition, Inc. logo
Full-time|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing the future of physical AI. Established in 2017 and currently valued at $15 billion, this Silicon Valley-based company is developing the digital infrastructure essential for instilling intelligence into every moving machine globally. Applied Intuition provides innovative solutions to a spectrum of industries, including automotive, defense, trucking, construction, mining, and agriculture, focusing on tools and infrastructure, operating systems, and autonomy. Renowned by eighteen of the top twenty global automakers, as well as the U.S. military and its allies, our solutions are trusted to deliver unparalleled physical intelligence. Our headquarters is located in Sunnyvale, California, with additional offices spread across Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.We adopt an in-office culture, expecting our employees to primarily work from their Applied Intuition office five days a week. However, we value flexibility and trust our employees to manage their schedules responsibly, which may include occasional remote work, starting the day with morning meetings from home, or leaving early for family commitments.About the Role and TeamWe are seeking multiple enthusiastic Research Scientists to join our dynamic Research Group at Applied Intuition. Our mission is to innovate groundbreaking technologies that empower the next generation of physical AI, focusing on two pivotal applications that are transforming our daily lives: end-to-end autonomous driving and robotic general intelligence. Our team is composed of leading experts from prestigious institutions and companies, recognized for their outstanding contributions to academia and industry, including eight Best Paper awards from leading conferences and journals such as CVPR and ICRA. To learn more, visit appliedintuition.com/research.Backed by industry-leading tools and infrastructure, researchers have access to extensive datasets derived from large fleets and can deploy their developed methods across diverse autonomous and robotic systems, including self-driving cars and trucks.

Feb 13, 2026
Apply
company
Full-time|On-site|Sunnyvale, CA

About the Institute of Foundation ModelsWe are a pioneering research laboratory focused on the creation, comprehension, utilization, and risk management of foundation models. Our mission is to propel advancements in research, cultivate the next generation of AI innovators, and contribute significantly to a knowledge-driven economy.As a member of our esteemed team, you'll engage with leading researchers, data scientists, and engineers on the forefront of foundation model training, addressing some of the most crucial challenges in AI development. This role presents a unique opportunity to develop groundbreaking AI solutions with the potential to transform entire industries. Your strategic and innovative problem-solving capabilities will be vital in positioning MBZUAI as a global center for high-performance computing in deep learning, inspiring future AI trailblazers.Position SummaryAs a Research Scientist on our Reinforcement Learning team, you will be instrumental in shaping our scientific and technical strategies towards developing advanced capabilities in Foundation Models. This position requires innovating new methodologies in Reinforcement Learning to drive paradigm shifts in foundation modeling. Your responsibilities will include prototyping and refining novel learning approaches, enhancing large-scale RL training infrastructure, and producing reproducible code for public dissemination. Additionally, you will be expected to cultivate and maintain an impactful research portfolio through both internal and external collaborations.Key Responsibilities- Innovate research focused on large-scale self-play for foundation model training, agentic tasks, and equipping models with the ability to learn proactively from their environment.- Initiate and pursue cutting-edge algorithmic strategies within reinforcement learning to define and advance emergent capabilities in Foundation Models.- Engage in full-stack engineering, encompassing data curation, model architecture, algorithm design, and the final deployment of models for end-users with a commitment to high-quality, documented, and maintainable code.- Contribute to technical reports and research publications.- Represent MBZUAI at industry conferences and events.

Jul 31, 2025
Apply
companyApplied Intuition, Inc. logo
Full-time|$126K/yr - $423K/yr|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI technologies. Established in 2017 and currently valued at $15 billion, this Silicon Valley powerhouse is dedicated to developing the essential digital infrastructure that will empower intelligent operations in every vehicle and machine worldwide. Our innovative solutions cater to the automotive, defense, trucking, construction, mining, and agriculture sectors, focusing on three pivotal areas: tools and infrastructure, operating systems, and autonomous capabilities. Our reputation is underscored by the trust placed in us by 18 of the top 20 global automakers and the United States military, among others. Our headquarters is in Sunnyvale, California, with additional offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.We promote a collaborative in-office culture and expect our team members to primarily work from their Applied Intuition office five days a week. However, we also value flexibility, allowing our employees to manage their schedules responsibly, which may include occasional remote work, starting the day with morning meetings from home, or leaving early to meet family obligations.About the Role and TeamWe are seeking enthusiastic Research Scientists to join our dynamic Research Group at Applied Intuition. Our mission is to develop pioneering technologies that drive the evolution of physical AI, particularly in two transformative applications: end-to-end autonomous driving and general-purpose robotics. Our team comprises distinguished experts from leading institutions and companies, celebrated for their remarkable contributions to both academia and industry, including eight Best Paper awards at prestigious conferences such as CVPR and ICRA. Learn more about our research initiatives at appliedintuition.com/research.With access to industry-leading tools and infrastructure, our researchers can leverage millions of miles of data from extensive fleets and implement their innovative methods across diverse autonomous and robotic systems, including self-driving vehicles and autonomous machinery.

Feb 13, 2026
Apply
companyApplied Intuition, Inc. logo
Internship|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is revolutionizing the future of physical AI. Established in 2017 and currently valued at $15 billion, this Silicon Valley innovator is developing the essential digital framework to infuse intelligence into every moving machine globally. Our services cater to vital sectors such as automotive, defense, trucking, construction, mining, and agriculture, focusing on three main pillars: tools and infrastructure, operating systems, and autonomy. Renowned leaders, including 18 of the top 20 global automakers and the United States military along with its allies, rely on our solutions for advancing physical intelligence. Our headquarters is in Sunnyvale, California, with additional offices located in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.As an in-office company, we expect our employees to primarily work at the Applied Intuition office five days a week. However, we understand the need for flexibility and trust our employees to manage their schedules responsibly. This could involve occasional remote work, starting the day with morning meetings from home before coming to the office, or leaving earlier when necessary for family commitments.About the Role:We are actively seeking passionate Research Interns to become part of our Research Group at Applied Intuition. Our mission is to develop pioneering technology that empowers the next generation of physical AI, particularly in the most challenging domains that are transforming our daily lives: end-to-end autonomous driving and robotic generalists. Our team comprises distinguished experts from leading institutions and companies, acclaimed for their outstanding academic and industry contributions, including eight Best Paper awards at top conferences and journals such as CVPR and ICRA. For more information, visit appliedintuition.com/research.With access to industry-leading tools and infrastructure, our researchers can utilize millions of miles of data from extensive fleets and implement their developed methods across various autonomous and robotic systems, including self-driving cars and trucks, autonomous mining and construction machines, humanoid robots, and dexterous hands.

Feb 17, 2026
Apply
companyApplied Intuition, Inc. logo
Internship|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of physical AI innovation. Established in 2017 and valued at $15 billion, our Silicon Valley-based company is building the essential digital infrastructure to integrate intelligence into every mobile machine worldwide. We cater to the automotive, defense, trucking, construction, mining, and agriculture sectors across three primary domains: tools and infrastructure, operating systems, and autonomy. Our solutions are trusted by 18 of the world's 20 leading automotive manufacturers, as well as the U.S. military and its allies. Our headquarters are located in Sunnyvale, California, with additional offices in Washington, D.C.; San Diego; Ft. Walton Beach, FL; Ann Arbor, MI; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.We are an in-office organization, and we expect our team members to work predominantly from the Applied Intuition office five days a week. Nevertheless, we understand the significance of flexibility and trust our employees to responsibly manage their schedules, which may include occasional remote work, starting the day with morning meetings from home, or leaving early for family commitments.About the Role:We are seeking enthusiastic Research Interns to join our Research Group at Applied Intuition. Our group's mission is to develop state-of-the-art technology that enables the next generation of physical AI. We focus on two of the most complex applications that are transforming our daily lives: end-to-end autonomous driving and robotic generalists. Our team includes leading experts from prestigious institutions and companies, recognized for their outstanding contributions to academia and industry, with accolades including eight Best Paper awards at top conferences and journals such as CVPR and ICRA. For more details, visit appliedintuition.com/research.Equipped with industry-leading tools and infrastructure, our researchers have access to millions of miles of data from extensive fleets and can implement the methods they develop into various autonomous and robotic systems, including self-driving vehicles, autonomous mining and construction machines, humanoid robots, and dexterous hands.

Feb 17, 2026
Apply
companyBosch Group logo
Internship|On-site|Sunnyvale

Join Bosch as an enthusiastic intern and contribute to pioneering advancements in reinforcement learning and simulation for autonomous vehicle planning. This role focuses on innovative research and development of cutting-edge algorithms, conducting experiments, and translating groundbreaking ideas into viable products.Internship Opportunities: Collaborate with a team of skilled researchers and engineers in one of the following domains:GPU-Accelerated Simulation for Reinforcement Learning:Design and improve high-performance, scalable simulation environments specifically for reinforcement learning applications in autonomous driving.ML-Based Planning Models Integration:Create, train, and embed planning models for autonomous driving, utilizing GPU-accelerated simulations to enhance performance in complex driving scenarios.Hybrid Learning Approaches:Innovate and enhance learning methodologies that integrate imitation and reinforcement learning, emphasizing multi-agent self-play techniques.Key Responsibilities:Engage in transformative engineering projects that apply deep learning and reinforcement learning to resolve challenges in autonomous driving planning and simulation.Collaborate with an international team of experts to implement advanced research results into Bosch's business units, testing and validating concepts in simulated environments and with self-driving vehicles.Work alongside domain specialists to explore novel learning-based planning and decision-making strategies.Conduct benchmarking and validation of models using extensive datasets and simulations.Share research outcomes through comprehensive internal reports and potential external publications.

Feb 23, 2026
Apply
companyCoram AI logo
Full-time|On-site|Sunnyvale

At Coram AI, we are revolutionizing video security for the contemporary landscape. Our innovative cloud-native platform leverages advanced computer vision and artificial intelligence to empower businesses to enhance safety, facilitate informed decision-making, and accelerate operations. This includes features such as real-time alerts, effortless clip sharing, and comprehensive visibility across multiple locations.Joining our agile and dynamic team means being part of a collaborative environment that prioritizes clarity, excellence, and impactful contributions. Every team member has a voice, delivers significant work, and plays a crucial role in shaping how AI can foster a safer and more interconnected world.We are seeking engineers who thrive at the nexus of robotics, real-time systems, and deep learning. This position focuses on implementing high-performance vision and multimodal models on robotic platforms, where factors such as latency, reliability, and hardware limitations are paramount.

Mar 11, 2026
Apply
company
Full-time|On-site|Sunnyvale, CA

About the Institute of Foundation ModelsWe are a pioneering research laboratory focused on the development, understanding, application, and risk management of foundational models. Our mission is to propel research forward, cultivate the next generation of AI innovators, and make substantial contributions to a knowledge-driven economy.Join us and collaborate with top-tier researchers, data scientists, and engineers on the forefront of foundational model training. Engage in solving critical challenges that can redefine entire sectors through advanced AI solutions. Your strategic and innovative problem-solving skills will play a vital role in positioning MBZUAI as an international leader in high-performance computing for deep learning, facilitating discoveries that will inspire future AI trailblazers.The Role We are seeking a skilled distributed ML infrastructure engineer to enhance and expand our training systems. You will collaborate closely with distinguished researchers and engineers to:• Develop and scale distributed training frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod)• Implement distributed optimizers based on mathematical specifications• Create robust configuration and launching systems across multi-node, multi-GPU clusters• Manage experiment tracking, metrics logging, and job monitoring for enhanced external visibility• Enhance the reliability, maintainability, and performance of training systems• While much of your work will support large-scale pre-training, prior pre-training experience is not mandatory; strong infrastructure and systems expertise are our primary focus.Key Responsibilities • Distributed Framework Ownership – Extend or adapt training frameworks (e.g., DeepSpeed, FSDP) to accommodate new applications and architectures.• Optimizer Implementation – Convert mathematical optimizer specifications into distributed implementations.• Launch Config & Debugging – Develop and troubleshoot multi-node launch scripts with adaptable batch sizes and parallelism strategies.

Jul 18, 2025
Apply
companyApplied Intuition, Inc. logo
Full-time|$126K/yr - $423K/yr|On-site|Sunnyvale, California, United States

Discover Applied IntuitionFounded in 2017 and currently valued at $15 billion, Applied Intuition, Inc. is at the forefront of advancing physical AI technologies. Our mission is to establish the digital infrastructure that will integrate intelligence into moving machines worldwide. We cater to diverse sectors including automotive, defense, trucking, construction, mining, and agriculture, focusing on three key areas: tools and infrastructure, operating systems, and autonomy. Our solutions are trusted by leading global automakers and the United States military. Headquartered in Sunnyvale, California, we also have offices in major cities worldwide including Washington, D.C., San Diego, Ft. Walton Beach, Ann Arbor, London, Stuttgart, Munich, Stockholm, Bangalore, Seoul, and Tokyo. Explore more at applied.co.As an in-office company, we expect our employees to work from the Applied Intuition office five days a week. We also value flexibility, allowing for responsible management of schedules, which may include occasional remote work or adjusted hours for family commitments.Role Overview and Team DynamicsWe are excited to invite multiple passionate Research Engineers to join our Research Group at Applied Intuition. Our mission is to pioneer groundbreaking technology that will drive the next generation of physical AI, focusing on challenging applications such as end-to-end autonomous driving and robotic generalists. The team comprises leading experts recognized for their academic and industry contributions, including several Best Paper awards from premier conferences like CVPR and ICRA. Learn more about our research initiatives at appliedintuition.com/research.With access to industry-leading tools and infrastructure, our researchers can utilize millions of miles of data from extensive fleets, deploying innovative methods across various autonomous and robotic systems, including self-driving vehicles.

Feb 17, 2026
Apply
companyMeshy logo
Full-time|On-site|Sunnyvale

Join Meshy as an AI Infrastructure EngineerLocated in the heart of Silicon Valley, Meshy is a pioneering force in the realm of 3D generative AI. Our mission is to Unleash 3D Creativity, revolutionizing the content creation process. We empower both professional artists and enthusiastic hobbyists to effortlessly craft extraordinary 3D assets, converting text and images into breathtaking 3D models in mere minutes. What used to require weeks of effort and thousands of dollars now takes just 2 minutes and costs only $1.Our elite team comprises leading experts in computer graphics, AI, and artistry, featuring alumni from prestigious institutions such as MIT, Stanford, and Berkeley, alongside seasoned professionals from Nvidia and Microsoft. With a diverse workforce spread across North America, Asia, and Oceania, we cultivate a culture of innovation aimed at solving global 3D challenges. We are backed by top-tier venture capital firms including Sequoia and GGV, having successfully raised $52 Million in funding.Meshy stands as the market leader, acclaimed as the No.1 in popularity among 3D AI tools (according to 2024 A16Z Games) and leading in web traffic (as per SimilarWeb, with 3 Million monthly visits). Our platform supports over 5 Million users and has facilitated the generation of 40 Million models.Our Founder and CEO, Yuanming (Ethan) Hu, earned his Ph.D. in graphics and AI from MIT, where he created the highly regarded Taichi GPU programming language (27K stars on GitHub, utilized by over 300 institutes). His influential work includes an honorable mention for the SIGGRAPH 2022 Outstanding Doctoral Dissertation Award and more than 2,700 research citations.Your RoleThis position merges platform engineering, site reliability, and applied ML systems. You will be responsible for ensuring the reliability, scalability, and operability of Meshy's AI model serving stack and core engineering infrastructure. The team manages a conventional production infrastructure (CI/CD, build systems, deployment, runtime environments) while developing a model-serving platform that links the models created by our Research Team to product-facing backend systems.This role is systems-heavy, focused on production, and dedicated to transforming experimental model artifacts into robust, observable, and cost-efficient services.Key ResponsibilitiesEnsure production reliability: manage availability, latency, error budgets, incident response, postmortems, and follow-ups.Develop and maintain observability frameworks: metrics, logs, traces, and alerting systems.

Feb 11, 2026
Apply
company
Full-time|On-site|Sunnyvale, CA

About the Institute of Foundation Models At the Institute of Foundation Models, we are on a mission to innovate and enhance the development of foundation models. Our research lab is committed to advancing AI through understanding, utilization, and effective risk management of these models. We aim to empower the next generation of AI developers and contribute significantly to a knowledge-driven economy.Joining our team means you will work at the forefront of foundation model training, collaborating with elite researchers, data scientists, and engineers. You will tackle pivotal challenges in AI development and contribute to the creation of revolutionary AI solutions that could transform various industries. Your strategic and innovative problem-solving skills will play a key role in establishing MBZUAI as a global leader in high-performance computing for deep learning, fostering impactful discoveries that will inspire future AI visionaries.The Role We are in search of a Foundation Model DevOps Engineer who will focus on Operational Stability to support our AI research infrastructure. You will be responsible for creating an efficient environment that facilitates model development. Your role involves building tooling, release pipelines, and storage policies that alleviate burdens on our research team. You will manage the foundational layer, ensuring that researchers have immediate, secure, and reliable access to essential tools, data, and computational resources.Key Responsibilities Model Release Engineering High-Fidelity Release Management: You will uphold the standards of our public presence, ensuring that all releases (weights, code, training logs, data) are reproducible, comprehensively documented, and presented with the professionalism of a leading open-source product.CI/CD for Research: You will design and implement pipelines that automate the testing and packaging of intricate model releases, transitioning us from manual procedures to automated validation.Repo Administration: You will administer the organization’s Git repositories, ensuring optimal performance and accessibility.

Jan 16, 2026
Apply
companyApplied Intuition, Inc. logo
Full-time|$222K/yr - $222K/yr|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI. Established in 2017 and currently valued at approximately $15 billion, this Silicon Valley-based company is dedicated to developing the digital infrastructure essential for instilling intelligence in every moving machine globally. We cater to various sectors, including automotive, defense, trucking, construction, mining, and agriculture, focusing on three main areas: tools and infrastructure, operating systems, and autonomy. Our solutions have earned the trust of 18 of the top 20 global automakers and the United States military and its allies. Headquartered in Sunnyvale, California, we have offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.We prioritize in-office collaboration and expect employees to work from their Applied Intuition office five days a week. However, we also value flexibility and trust our employees to responsibly manage their schedules, which may include occasional remote work, starting the day with morning meetings from home, or leaving earlier to accommodate family obligations.About the RoleWe are seeking infrastructure engineers with a strong background in machine learning pipelines, as well as ML engineers who are eager to extend their skills beyond modeling. This position will involve engagement across the complete ML lifecycle, including dataset generation, training frameworks, compute, evaluation, and deployment, collaborating closely with modeling teams. If you thrive on tackling broad and ambiguous challenges and enjoy working across the entire ML stack, this team is the perfect fit for you. At Applied Intuition, we encourage all engineers to take ownership of technical and product decisions, actively engage with both internal and external users to gather feedback, and contribute to a thoughtful, dynamic team culture.At Applied Intuition, You Will:Design and implement distributed cloud GPU training strategies for deep learning model training and evaluation.Create comprehensive machine learning pipelines and integrate them into our core systems.

Jan 14, 2026
Apply
company
Full-time|On-site|Sunnyvale, CA

About the Institute of Foundation ModelsWe are a pioneering research lab focused on the development, understanding, application, and risk management of foundation models. Our mission is to propel research forward, cultivate the next generation of AI innovators, and make significant contributions to a knowledge-driven economy.Join our dynamic team and engage in the heart of innovative foundation model training, collaborating with top-tier researchers, data scientists, and engineers. Tackle groundbreaking challenges in AI development and contribute to transformative AI solutions that have the potential to revolutionize industries. Your strategic and innovative problem-solving skills will be vital in establishing MBZUAI as a global center for high-performance computing in deep learning, enabling impactful discoveries that inspire the future of AI innovation.Role OverviewDevelop and Enhance Distributed Pre-Training Frameworks· Implement DeepSpeed / FSDP / Megatron-LM on multi-node GPU clusters.· Design robust launch scripts, resilient checkpoints, and job monitoring systems (e.g., NCCL/GLOO/GPU).Transform Mathematical Concepts into High-Performance Production Code· Prototype novel optimizers or attention mechanisms using PyTorch/NumPy/JAX or similar frameworks.· Convert prototypes into efficient CUDA/Triton kernels with custom gradients and performance tests.Enhance Training Efficiency and Stability· Lead efforts in mixed-precision training, integrating bf16, fp8, etc., into regular workflows while assessing accuracy versus speed improvements and analyzing numerical stability.· Utilize kernel fusion, communication tuning, and memory optimization to achieve state-of-the-art throughput.Accelerate Research Progress· Develop logging and metrics systems, along with experiment-tracking tools, to facilitate rapid iteration.· Design ablation studies and statistical tests that validate or challenge new concepts.· Guide interns and junior engineers through clear asynchronous design documentation and code reviews.You will collaborate closely with researchers, deliver production code, and shape the landscape of large language models.

Jun 9, 2025
Apply
companyApplied Intuition, Inc. logo
Full-time|$204K/yr - $343K/yr|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI technologies. Established in 2017 and currently valued at $15 billion, our Silicon Valley-based company is dedicated to creating the essential digital infrastructure that empowers intelligence across all moving machines globally. Our solutions serve critical sectors including automotive, defense, trucking, construction, mining, and agriculture, focusing on three main domains: tools and infrastructure, operating systems, and autonomy. Trusted by 18 of the top 20 global automakers, as well as the U.S. military and its allies, Applied Intuition is committed to delivering unparalleled physical intelligence solutions. Our headquarters is located in Sunnyvale, California, complemented by offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. For more information, visit applied.co.We prioritize in-office collaboration and expect our employees to work primarily from our Applied Intuition office five days a week. However, we value flexibility and trust our team members to manage their schedules responsibly. This may include occasional remote work, starting the day with morning meetings from home before heading to the office, or leaving early when necessary to accommodate personal commitments.About the RoleAs an Engineering Manager on our Machine Learning Platform team, you will lead an exceptional group of engineers dedicated to building the infrastructure that enables Physical AI at scale. Your team will oversee three pivotal areas: Training & Inference Orchestration, where we develop frameworks to efficiently schedule and execute extensive tasks across thousands of GPUs; GPU Cluster Architecture, where we design and expand what will become the industry's largest GPU cluster for Physical AI; and Performance Optimization, where we maximize hardware utilization, throughput, and cost efficiency for large-scale training and inference workloads. You will collaborate at the intersection of systems engineering and machine learning, working directly with stack development and research teams to eliminate bottlenecks and expedite the transition from experimentation to production.

Feb 19, 2026
Apply
company
Full-time|On-site|Sunnyvale, CA

About the Institute of Foundation ModelsWe are an innovative research laboratory focused on the creation, comprehension, application, and risk management of foundation models. Our mission is to propel research forward, cultivate the next generation of AI innovators, and contribute significantly to a knowledge-driven economy.Joining our team presents a unique opportunity to engage in the core of advanced foundation model training, collaborating with leading researchers, data scientists, and engineers as we address the most pivotal and influential challenges in AI advancement. Your work will involve the creation of groundbreaking AI solutions with the potential to revolutionize entire industries. Employing strategic and innovative problem-solving skills will be crucial in establishing MBZUAI as a premier global center for high-performance computing in deep learning, fostering remarkable discoveries that inspire future AI trailblazers.

Mar 17, 2025
Apply
companyCerebras Systems logo
Full-time|On-site|Sunnyvale CA or Toronto Canada

At Cerebras Systems, we are at the forefront of AI technology, developing the world's largest AI chip that is 56 times larger than conventional GPUs. Our innovative wafer-scale architecture enables the computational power of dozens of GPUs on a single chip, simplifying programming to the ease of handling one device. This unique design allows us to achieve unparalleled training and inference speeds, empowering machine learning practitioners to seamlessly deploy large-scale ML applications without the complexity of managing numerous GPUs or TPUs.Our clientele includes leading model labs, global enterprises, and pioneering AI-native startups. Recently, OpenAI announced a multi-year partnership with Cerebras aimed at leveraging 750 megawatts of scale to revolutionize critical workloads through ultra-high-speed inference.Thanks to our groundbreaking wafer-scale architecture, Cerebras Inference delivers the fastest Generative AI inference solution globally, exceeding GPU-based hyperscale cloud inference services by over tenfold. This significant boost in speed is transforming the user experience of AI applications, facilitating real-time iteration and enhancing intelligence through added agentic computation.About The RoleAs an Applied Machine Learning Research Scientist at Cerebras, you will be instrumental in converting modern machine learning methodologies into scalable, high-performance systems. This position focuses on the intersection of modeling and systems, emphasizing the efficient execution of existing algorithms rather than merely publishing new ones. Your efforts will significantly influence the training, optimization, and deployment of large language models (LLMs) on one of the most sophisticated AI platforms in existence.You will collaborate closely with fellow researchers and senior engineers to enhance workflows for LLM pretraining, fine-tuning, and reinforcement learning-based post-training. Your responsibilities will encompass building training pipelines, debugging complex system behaviors, improving model quality, and refining data and evaluation strategies. Your contributions will have a direct and meaningful impact on advancing our capabilities in AI.

Mar 5, 2026
Apply
companyCerebras Systems logo
Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems revolutionizes the AI landscape with the creation of the world’s largest AI chip, a remarkable 56 times larger than conventional GPUs. Our innovative wafer-scale architecture delivers the computational power of numerous GPUs on a single chip, simplifying programming efforts for users. This unique approach enables Cerebras to achieve unparalleled training and inference speeds, empowering machine learning practitioners to seamlessly execute large-scale ML applications without the complexities of managing hundreds of GPUs or TPUs.Our clientele includes leading model laboratories, global enterprises, and pioneering AI-native startups. Notably, OpenAI recently announced a multi-year partnership with Cerebras to deploy 750 megawatts of scale, significantly enhancing key workloads with ultra-high-speed inference.Thanks to our groundbreaking wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution globally, exceeding the performance of GPU-based hyperscale cloud inference services by over ten times. This significant speed enhancement transforms the user experience of AI applications, facilitating real-time iterations and augmented intelligence through additional agentic computation.About The RoleWe are on the lookout for a highly skilled and experienced AI Infrastructure Operations Engineer to oversee and manage our state-of-the-art machine learning compute clusters. In this role, you will have the unique opportunity to work with the world’s largest computer chip, the Wafer-Scale Engine (WSE), and the systems that leverage its extraordinary power.You will play a pivotal role in ensuring the health, performance, and availability of our infrastructure, maximizing compute capacity, and supporting our expanding AI initiatives. This position requires an in-depth understanding of Linux-based systems, expertise in containerization technologies, and experience in monitoring and troubleshooting complex distributed systems. The ideal candidate is a proactive problem-solver with a strong background in large-scale compute infrastructure who is reliable and committed to customer success.

Feb 17, 2026
Apply
companyCoram AI logo
Full-time|On-site|Sunnyvale

Join Coram AI, where we are redefining video security for a modern landscape. Our innovative, cloud-native platform harnesses computer vision and artificial intelligence to empower businesses with enhanced safety, informed decision-making, and rapid operational responses, ranging from real-time alerts to effortless clip sharing and comprehensive visibility across multiple sites.As a member of our dynamic and agile team, you will embrace clarity, craftsmanship, and impactful contributions. Every team member's voice matters, they deliver significant results, and collectively shape the future of AI in making the world safer and more interconnected.About the Role:At Coram AI, our infrastructure transcends the conventional cloud-based stack. Alongside our AWS and Kubernetes framework, we manage an extensive array of IoT devices remotely. We are seeking a skilled engineer to take charge of a substantial segment of our edge and cloud architecture that supports our IoT product line—responsible not only for infrastructure but also for developing and maintaining our proprietary in-house software.Joining our team means tackling intriguing challenges at the crossroads of user experience, machine learning, and infrastructure. It embodies a commitment to excellence, continuous learning, and delivering exceptional products to our clients in a high-energy startup environment.Key Responsibilities:Develop and maintain production-grade software for our custom edge infrastructure stack.Provision and manage resources within AWS.Oversee provisioning and management for hundreds of thousands of deployed connected IoT devices.Create CI/CD and automation pipelines for various components of the stack.Implement observability and telemetry across our cloud applications and edge devices.Assist in maintaining compliance with various security standards (e.g., SOC2, HIPAA).Enhance developer productivity by optimizing development workflows.This is an onsite role located in Sunnyvale.Qualifications:Minimum of 3 years of experience in developing production infrastructure on AWS using infrastructure as code tools like Pulumi or Terraform.Proficient in Docker and Kubernetes, especially EKS.At least 3 years of experience with programming languages such as Python, Go, or similar.

Feb 18, 2026

Sign in to browse more jobs

Create account — see all 656 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.