Distributed Machine Learning Engineer
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
About ifm-us
Join us at the Institute of Foundation Models, where we are committed to advancing the frontiers of AI research through the development of foundation models. Our lab is dedicated to nurturing talent and fostering innovation that shapes a knowledge-driven economy.
Similar jobs
Search for Distributed Machine Learning Engineer
553 results
About the Institute of Foundation ModelsWe are an innovative research laboratory focused on the creation, comprehension, application, and risk management of foundation models. Our mission is to propel research forward, cultivate the next generation of AI innovators, and contribute significantly to a knowledge-driven economy.Joining our team presents a unique opportunity to engage in the core of advanced foundation model training, collaborating with leading researchers, data scientists, and engineers as we address the most pivotal and influential challenges in AI advancement. Your work will involve the creation of groundbreaking AI solutions with the potential to revolutionize entire industries. Employing strategic and innovative problem-solving skills will be crucial in establishing MBZUAI as a premier global center for high-performance computing in deep learning, fostering remarkable discoveries that inspire future AI trailblazers.
About the Institute of Foundation ModelsWe are a pioneering research laboratory focused on the development, understanding, application, and risk management of foundational models. Our mission is to propel research forward, cultivate the next generation of AI innovators, and make substantial contributions to a knowledge-driven economy.Join us and collaborate with top-tier researchers, data scientists, and engineers on the forefront of foundational model training. Engage in solving critical challenges that can redefine entire sectors through advanced AI solutions. Your strategic and innovative problem-solving skills will play a vital role in positioning MBZUAI as an international leader in high-performance computing for deep learning, facilitating discoveries that will inspire future AI trailblazers.The Role We are seeking a skilled distributed ML infrastructure engineer to enhance and expand our training systems. You will collaborate closely with distinguished researchers and engineers to:• Develop and scale distributed training frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod)• Implement distributed optimizers based on mathematical specifications• Create robust configuration and launching systems across multi-node, multi-GPU clusters• Manage experiment tracking, metrics logging, and job monitoring for enhanced external visibility• Enhance the reliability, maintainability, and performance of training systems• While much of your work will support large-scale pre-training, prior pre-training experience is not mandatory; strong infrastructure and systems expertise are our primary focus.Key Responsibilities • Distributed Framework Ownership – Extend or adapt training frameworks (e.g., DeepSpeed, FSDP) to accommodate new applications and architectures.• Optimizer Implementation – Convert mathematical optimizer specifications into distributed implementations.• Launch Config & Debugging – Develop and troubleshoot multi-node launch scripts with adaptable batch sizes and parallelism strategies.
About the Institute of Foundation ModelsWe are a pioneering research lab focused on the development, understanding, application, and risk management of foundation models. Our mission is to propel research forward, cultivate the next generation of AI innovators, and make significant contributions to a knowledge-driven economy.Join our dynamic team and engage in the heart of innovative foundation model training, collaborating with top-tier researchers, data scientists, and engineers. Tackle groundbreaking challenges in AI development and contribute to transformative AI solutions that have the potential to revolutionize industries. Your strategic and innovative problem-solving skills will be vital in establishing MBZUAI as a global center for high-performance computing in deep learning, enabling impactful discoveries that inspire the future of AI innovation.Role OverviewDevelop and Enhance Distributed Pre-Training Frameworks· Implement DeepSpeed / FSDP / Megatron-LM on multi-node GPU clusters.· Design robust launch scripts, resilient checkpoints, and job monitoring systems (e.g., NCCL/GLOO/GPU).Transform Mathematical Concepts into High-Performance Production Code· Prototype novel optimizers or attention mechanisms using PyTorch/NumPy/JAX or similar frameworks.· Convert prototypes into efficient CUDA/Triton kernels with custom gradients and performance tests.Enhance Training Efficiency and Stability· Lead efforts in mixed-precision training, integrating bf16, fp8, etc., into regular workflows while assessing accuracy versus speed improvements and analyzing numerical stability.· Utilize kernel fusion, communication tuning, and memory optimization to achieve state-of-the-art throughput.Accelerate Research Progress· Develop logging and metrics systems, along with experiment-tracking tools, to facilitate rapid iteration.· Design ablation studies and statistical tests that validate or challenge new concepts.· Guide interns and junior engineers through clear asynchronous design documentation and code reviews.You will collaborate closely with researchers, deliver production code, and shape the landscape of large language models.
Institute of Foundation Models
About the Institute of Foundation ModelsThe Institute of Foundation Models (IFM) specializes in designing and operating large-scale GPU supercomputing systems aimed at training cutting-edge foundation models. Our philosophy places emphasis on the interdependence of performance, fault tolerance, and scalability across various components, including model architecture, communication systems, runtime, and hardware topology.This position is pivotal to our mission — enhancing communication performance, distributed reliability, and cross-layer optimization for extensive training workloads.The MissionWe seek a highly skilled engineer to collaboratively design and optimize the communication stack for large-scale distributed training, with a focus on hybrid parallelism and Mixture-of-Experts (MoE) workloads. This is a systems-level engineering role centered on performance enhancement, distributed debugging, and communication-runtime co-design.· Design and optimize expert-parallel and hybrid-parallel communication patterns· Drive high-performance hierarchical collectives for MoE workloads· Co-design runtime orchestration with communication topology awareness· Mitigate tail latency and enhance determinism across thousands of GPUs· Architect fault-tolerant distributed execution that withstands real-world cluster failuresCore Technical Scope· Communication-compute overlap and topology-aware collective optimization· In-depth debugging of NCCL, RDMA, and custom communication layers· Implementing hybrid expert parallel strategies in modern large-scale MoE systems· Developing elastic and resilient distributed job orchestration concepts· Conducting congestion analysis and routing optimization across InfiniBand/RoCE fabrics· Executing microbenchmarking and performance modeling for communication-intensive workloadsExpected Technical Depth· Expertise in hybrid expert parallel communication strategies
Illumio builds technology to contain ransomware and security breaches, helping organizations defend against cyber threats. The Illumio AI Security Graph underpins a platform that spots and contains threats in hybrid multi-cloud setups, aiming to stop attacks before they spread. Illumio is recognized as a leader in microsegmentation and supports Zero Trust architectures for critical infrastructure. The engineering team focuses on advancing cybersecurity through leadership, autonomy, and a strong sense of ownership. Engineers here develop and maintain a scalable SaaS platform using cloud-native tools, with deployments in both cloud and on-premises environments. Precision, quality, and collaboration shape the team's work, and engineers are encouraged to take initiative at every level. This Senior Machine Learning Engineer role is based onsite at Illumio’s Sunnyvale headquarters. The position centers on designing and scaling systems that power Illumio’s AI-driven security platform. Work involves handling large-scale data, distributed systems, and building advanced AI agents. Key Responsibilities Design and optimize high-throughput, event-driven systems with Apache Kafka to support real-time data flows. Develop and maintain large-scale data pipelines using Apache Spark or Flink for high-volume analytics and AI features. Create advanced AI agents that handle autonomous planning, memory management, and reliable tool use in distributed environments. Lead architectural design for containerized services on Kubernetes, focusing on availability and scalability across cloud platforms such as AWS, Azure, and GCP.
Applied Intuition, Inc.
About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI technology. Established in 2017 and currently valued at $15 billion, this Silicon Valley powerhouse is creating the essential digital framework that will infuse intelligence into every moving machine worldwide. Our solutions serve key sectors including automotive, defense, trucking, construction, mining, and agriculture, focusing on three main areas: tools and infrastructure, operating systems, and autonomy. Our trusted solutions are utilized by 18 of the top 20 global automakers and the United States military along with its allies. Our headquarters is located in Sunnyvale, California, with additional offices in Washington, D.C., San Diego, Ft. Walton Beach, Florida, Ann Arbor, Michigan, London, Stuttgart, Munich, Stockholm, Bangalore, Seoul, and Tokyo. Find out more at applied.co.We prioritize in-office collaboration, expecting our employees to work from their Applied Intuition office five days a week, while also embracing flexibility. Employees are trusted to manage their schedules, which may include occasional remote work, starting the day with morning meetings from home, or leaving early for family commitments.Hear from an Engineer:About the roleWe are searching for a talented software engineer specializing in perception for autonomous vehicles or mobile robotics. Your role will involve enhancing perception modules within our autonomous vehicle framework, including the development of 4D world representations to facilitate seamless autonomy. You will also lead the design and implementation of computer vision and machine learning strategies that empower self-driving vehicles to navigate effectively.In this dynamic and customer-centric team environment, you will not only contribute your engineering skills but also gain insights into best practices within the evolving autonomy sector. Our fast-paced culture encourages innovation and collaboration.
Bee Genius
At Bee Genius, we are pioneering the future of work today with innovative AI solutions that transform industries.Job Overview: We are looking for a talented AI/Machine Learning Engineer to become a vital part of our dynamic team. In this role, you will leverage your expertise to develop and deploy cutting-edge machine learning models and algorithms aimed at addressing complex business challenges.Key Responsibilities:Design, build, and refine machine learning models and algorithms.Train and assess models using extensive datasets.Optimize models for enhanced performance and accuracy.Collaborate with data scientists and software engineers to integrate models into operational systems.Stay abreast of the latest trends in AI and machine learning technologies.Promote the ethical deployment of AI solutions.
Applied Intuition, Inc.
About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI technology. Established in 2017 and currently valued at $15 billion, this Silicon Valley-based company is building the essential digital infrastructure to infuse intelligence into every moving machine worldwide. We cater to industries such as automotive, defense, trucking, construction, mining, and agriculture through three primary sectors: tools and infrastructure, operating systems, and autonomy. Our solutions are trusted by 18 of the top 20 global automakers, along with the United States military and its allies, to deliver exceptional physical intelligence. Our headquarters is located in Sunnyvale, California, with additional offices across Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.We are an in-office company, expecting our employees to primarily work from their Applied Intuition office five days a week. We understand the importance of flexibility and trust our employees to manage their schedules responsibly. This may include occasional remote work, starting the day with morning meetings from home before heading to the office, or leaving earlier when needed to accommodate family commitments.About the RoleWe are in search of a skilled software engineer with extensive experience in optimizing machine learning models and deploying them in production-grade embedded runtime environments. Your expertise will span the entire ML framework stack, including PyTorch, JAX, ONNX, TensorRT, CUDA, XLA, and Triton.At Applied Intuition, You Will:Lead ML performance optimization across various technologies for both on-road and off-road ADAS/AD stacks aimed at deployment on a range of embedded computing platforms.Devise compute usage strategies to enhance efficiency and minimize latency of model inference for compute boards chosen by our customers.Engage in model pruning and quantization, ensuring successful deployment on memory-constrained platforms.Collaborate closely with ML engineers and software developers to identify and optimize efficient model architecture solutions.Establish methodologies to...
Applied Intuition, Inc.
About Applied IntuitionApplied Intuition, Inc. is at the forefront of physical AI innovation. Established in 2017 and currently valued at $15 billion, this Silicon Valley firm is developing the essential digital infrastructure required to integrate intelligence into every moving machine globally. Serving vital sectors such as automotive, defense, trucking, construction, mining, and agriculture, Applied Intuition focuses on three primary areas: tools and infrastructure, operating systems, and autonomy. Our solutions are trusted by 18 of the world's top 20 automakers, along with the United States military and its allies, to deliver transformative physical intelligence. With headquarters in Sunnyvale, California, we boast additional offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.We embrace a culture of in-office collaboration, with a primary expectation for our team members to work from the Applied Intuition office five days a week. However, we value flexibility and trust our employees to manage their schedules responsibly. This includes the possibility of occasional remote work, starting the day with morning meetings from home, or leaving early to accommodate family obligations.About the RoleThe Data & Machine Learning Pipeline Engineer will play a crucial role in Applied Intuition's data flywheel initiative, developing systems that link vehicle data collection, training, and automated model enhancement. You will establish the infrastructure that enables our autonomous driving stack to learn continuously from both real-world and simulated data, thereby accelerating development across teams focused on perception, planning, and control.This position resides at the intersection of large-scale data engineering and machine learning infrastructure. You will collaborate closely with ML engineers and system developers to automate processes related to data selection, curation, and model iteration, ensuring our vehicles can improve autonomously with minimal human input.
Applied Intuition, Inc.
About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI technologies. Established in 2017 and currently valued at $15 billion, our Silicon Valley-based company is dedicated to creating the essential digital infrastructure that empowers intelligence across all moving machines globally. Our solutions serve critical sectors including automotive, defense, trucking, construction, mining, and agriculture, focusing on three main domains: tools and infrastructure, operating systems, and autonomy. Trusted by 18 of the top 20 global automakers, as well as the U.S. military and its allies, Applied Intuition is committed to delivering unparalleled physical intelligence solutions. Our headquarters is located in Sunnyvale, California, complemented by offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. For more information, visit applied.co.We prioritize in-office collaboration and expect our employees to work primarily from our Applied Intuition office five days a week. However, we value flexibility and trust our team members to manage their schedules responsibly. This may include occasional remote work, starting the day with morning meetings from home before heading to the office, or leaving early when necessary to accommodate personal commitments.About the RoleAs an Engineering Manager on our Machine Learning Platform team, you will lead an exceptional group of engineers dedicated to building the infrastructure that enables Physical AI at scale. Your team will oversee three pivotal areas: Training & Inference Orchestration, where we develop frameworks to efficiently schedule and execute extensive tasks across thousands of GPUs; GPU Cluster Architecture, where we design and expand what will become the industry's largest GPU cluster for Physical AI; and Performance Optimization, where we maximize hardware utilization, throughput, and cost efficiency for large-scale training and inference workloads. You will collaborate at the intersection of systems engineering and machine learning, working directly with stack development and research teams to eliminate bottlenecks and expedite the transition from experimentation to production.
Join our innovative team at Wayve as a Machine Learning Engineer specializing in Application Software. In this pivotal role, you will leverage your expertise in machine learning algorithms and software development to create cutting-edge applications that drive our technology forward. Collaborate with a diverse group of talented professionals to enhance our products and deliver exceptional solutions that meet our clients' needs.
Cerebras Systems
Cerebras Systems is at the forefront of AI innovation, creating the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our groundbreaking wafer-scale architecture delivers the computational power equivalent to dozens of GPUs on a single chip, combined with the programming simplicity of a unified device. This innovative approach allows us to offer unparalleled training and inference speeds, enabling machine learning practitioners to execute extensive ML applications seamlessly, without the complexities of managing multiple GPUs or TPUs.Cerebras boasts an impressive clientele, including premier model labs, global corporations, and pioneering AI startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aimed at deploying 750 megawatts of scale, revolutionizing critical workloads with ultra-fast inference capabilities.Our unique wafer-scale architecture enables Cerebras Inference to provide the fastest Generative AI inference solution globally, surpassing GPU-based hyperscale cloud inference services by more than tenfold. This remarkable enhancement in speed is reshaping the AI application user experience, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleThe Inference ML Engineering team at Cerebras Systems is committed to empowering our rapid generative inference solution through intuitive APIs, supported by a distributed runtime that operates on extensive clusters of our proprietary hardware. Our goal is to enable enterprises, developers, and researchers to fully harness the capabilities of our platform, leveraging its exceptional performance, scalability, and flexibility. The team collaborates closely with cross-functional groups, including compiler developers, cluster orchestrators, ML scientists, cloud architects, and product teams, to deliver impactful solutions that redefine the limits of ML performance and usability.As a Senior Software Engineer on the Inference ML Engineering team, you will be instrumental in designing and implementing APIs, ML features, and tools that facilitate the execution of state-of-the-art generative AI models on our custom hardware. Your role will involve architecting solutions that allow for seamless model translation and execution, ensuring high throughput and minimal latency while maintaining user-friendliness. You will lead technical initiatives and collaborate with other engineering teams to enhance our solutions.
Join our innovative team at intuitive as a Machine Learning Engineer, where you'll have the chance to work on cutting-edge AI technologies that are shaping the future. In this role, you will design, develop, and implement machine learning models that will drive impactful solutions across various sectors.As a critical member of our team, you will collaborate with data scientists and engineers to enhance our product offerings, ensuring they are not only effective but also scalable. This is an exceptional opportunity for those eager to leverage their skills in a thriving environment.
Applied Intuition, Inc.
About Applied IntuitionApplied Intuition, Inc. is at the forefront of revolutionizing physical AI. Founded in 2017 with a valuation of $15 billion, this Silicon Valley innovator is developing the essential digital infrastructure required to infuse intelligence into every moving machine globally. Our services cater to the automotive, defense, trucking, construction, mining, and agriculture sectors across three core domains: tools and infrastructure, operating systems, and autonomy. Trusted by 18 of the top 20 global automakers, as well as the U.S. military and its allies, our solutions are designed to deliver superior physical intelligence. Headquartered in Sunnyvale, California, we also have offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.At Applied Intuition, we are committed to fostering an in-office culture, requiring our employees to primarily work from the office five days a week. However, we value flexibility and trust our employees to responsibly manage their schedules. This may include occasional remote work, starting the day with morning meetings from home, or leaving early to accommodate family commitments.Meet our software engineers!Get to know some of our software engineers who are pioneering the future of autonomy and delivering top-notch solutions that help customers reduce time to market. Learn what motivated them to join Applied Intuition, what keeps them engaged, and their insights for prospective candidates.About the roleWe are seeking a talented software engineer to join our team focused on integrating advanced machine learning methodologies into high-quality sensor simulation. In this role, you will collaborate with our research team to implement cutting-edge techniques for modeling environments and sensors, including Lidars, Radars, and Cameras.At Applied Intuition, you will:...
Cerebras Systems
Cerebras Systems is at the forefront of AI technology, creating the world's largest AI chip that is 56 times the size of traditional GPUs. Our innovative wafer-scale architecture combines the compute power of dozens of GPUs into a single chip, simplifying the programming experience. This unique design enables us to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run extensive ML applications seamlessly without the complexities of managing numerous GPUs or TPUs.Our clientele includes premier model laboratories, multinational corporations, and pioneering AI-driven startups. Notably, OpenAI has recently formed a multi-year collaboration with Cerebras, aiming to harness 750 megawatts of computational scale to revolutionize key workloads through ultra-high-speed inference.Thanks to our cutting-edge wafer-scale architecture, Cerebras Inference delivers the fastest Generative AI inference solution globally, achieving speeds over ten times faster than GPU-based hyperscale cloud inference services, thus transforming the user experience of AI applications and enabling real-time iterations and enhanced intelligence through additional agentic computation.Responsibilities:Lead the design and implementation of advanced system-level debugging, validation, and observability platforms.Develop automated systems for collecting and analyzing numerical data and execution anomalies.Create visualization and analysis tools to facilitate efficient root-cause investigations.Build frameworks for failure classification, regression detection, and anomaly monitoring.Enhance compilers, runtimes, and programming interfaces to support sophisticated profiling and instrumentation.Improve workflows related to system bring-up, low-level debugging, and validation.Collaborate cross-functionally with teams in compiler, hardware, firmware, runtime, and infrastructure domains.Establish best practices to ensure debuggability, reliability, and operational excellence.Lead impactful initiatives and support incident response while driving long-term corrective solutions.
Intuitive Surgical, Inc.
Join our dynamic team as a Senior Machine Learning Engineer at Intuitive, where you will play a pivotal role in advancing our robotic surgery technologies. We are looking for a talented individual with a strong background in machine learning, artificial intelligence, and data analysis.
Who You AreWe are seeking talented Machine Learning Systems Engineers to contribute to the development of the world's largest end-to-end 3D native machine learning systems. You will collaborate on our comprehensive ML framework tailored for 3D applications, encompassing pretraining, fine-tuning, inference, and more. We value strong hands-on engineering skills, a passion for learning, and an ability to excel in a dynamic, high-responsibility environment.Who We AreAt Meshy, we envision a world where 3D creation is limitless and accessible to all. Our mission is clear: unleash creativity. We have developed a comprehensive pipeline for 3D content that spans text/image to 3D, texturing, texture editing, animation rigging, and beyond. Additionally, we foster a vibrant community for creators to share their work, draw inspiration from others, and utilize our platform as an asset marketplace for their games and prototypes. Recognized as the No.1 in popularity among 3D AI tools (according to the 2024 A16Z Games survey), Meshy delivers real value to enterprises such as Meta, Square Enix, and DeepMind, as well as millions of end-users. Our technology powers game and film production, 3D printing, industrial product design, user-generated content features, and even training simulations for robotics and physical AI.Your Next Challenge3D is the exciting new frontier of Generative AI, and your role at Meshy will present unique challenges in both training and inference. You will engage with the full stack of AI, from debugging and monitoring hardware platforms, building training frameworks, scaling high-throughput 3D data pipelines, collaborating with researchers on novel model architectures, to developing efficient inference engines for diffusion models and more. Here are some specific challenges on the training side:Collaborate closely with researchers to co-design the next frontier of 3D & Spatial AI.Develop and refine modern PyTorch solutions for maximum parallelism and efficiency, establishing a clean and intuitive training infrastructure for our foundational models.Identify bottlenecks and optimize for high throughput & efficient distributed model training across hundreds to thousands of GPUs.Implement and maintain 3D-specific custom operators in Triton or CUDA.Design and uphold novel data-loading frameworks and libraries.
Applied Intuition, Inc.
About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI. Established in 2017 and currently valued at $15 billion, this Silicon Valley powerhouse is building the essential digital infrastructure to infuse intelligence into every moving machine globally. Our solutions cater to critical sectors, including automotive, defense, trucking, construction, mining, and agriculture, focusing on tools and infrastructure, operating systems, and autonomy. Trusted by 18 of the top 20 global automakers and the U.S. military and its allies, we are shaping the future of intelligent systems. Find out more at applied.co.We prioritize in-office collaboration, expecting our employees to work from our Applied Intuition office five days a week. Nevertheless, we value flexibility and trust our team to manage their schedules responsibly, which may include occasional remote work or adjusting hours to meet personal commitments.About the RoleWe are seeking a talented Software Engineer specializing in ML-first behavior prediction and planning. In this role, you will create advanced ML behavior modules capable of forecasting the future movements of road users and their interactions. You will collaborate closely with the perception team on training data generation and with the planning team on developing ML-based planners.Your contributions will not only enhance our engineering efforts but also immerse you in our vibrant, customer-centric team culture, where you will engage with industry best practices in the rapidly evolving autonomy sector. We pursue excellence in our products and operations, and if you are ready to make a significant impact in realizing autonomous systems, Applied Intuition is your ideal environment!
Intuitive Surgical, Inc.
As a Staff Machine Learning Engineer at Intuitive Surgical, you will play a critical role in developing advanced machine learning algorithms that drive innovative healthcare solutions. Your expertise will contribute to enhancing robotic surgical systems, improving patient outcomes, and redefining the future of surgery.Join our dynamic team and leverage your skills to address complex challenges in the medical field, collaborating with cross-functional teams to implement transformative technologies.
Cerebras Systems is at the forefront of AI innovation, having developed the world's largest AI chip, which is 56 times greater in size than conventional GPUs. Our revolutionary wafer-scale architecture delivers the computational power of multiple GPUs on a single chip, simplifying programming to a single device experience. This unique approach enables Cerebras to provide unparalleled training and inference speeds, allowing machine learning professionals to seamlessly operate large-scale ML applications without the complexities of managing numerous GPUs or TPUs.Our clientele includes leading model labs, global corporations, and pioneering AI-native startups. Recently, OpenAI formed a multi-year collaboration with Cerebras to harness 750 megawatts of capacity, revolutionizing critical workloads with ultra-fast inference capabilities.Thanks to our innovative wafer-scale architecture, Cerebras Inference stands as the fastest Generative AI inference solution globally, boasting speeds over ten times faster than traditional GPU-based hyperscale cloud inference services. This significant enhancement in speed transforms user experiences with AI applications, facilitating real-time iterations and augmenting intelligence through additional agentic computation.About The RoleIn the capacity of a Senior Software Engineer within the ML Integration and Quality team, you will be instrumental in integrating and delivering all software and hardware components of the Cerebras AI platform. Your focus will be on software feature integration and quality assurance, including pre-deployment and production validation of Cerebras' training and inference solutions. You will advocate for superior testing practices, effective debugging methodologies, and exemplary cross-team communication to ensure the delivery of world-class products.
Sign in to browse more jobs
Create account — see all 553 results

