Senior Distributed Systems Engineer
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
About Institute of Foundation Models
The Institute of Foundation Models (IFM) is at the forefront of GPU supercomputing, focusing on the development of foundation models that revolutionize machine learning capabilities. We value innovation, collaboration, and the relentless pursuit of efficiency in computational processes.
Similar jobs
Institute of Foundation Models
About the Institute of Foundation ModelsThe Institute of Foundation Models (IFM) specializes in designing and operating large-scale GPU supercomputing systems aimed at training cutting-edge foundation models. Our philosophy places emphasis on the interdependence of performance, fault tolerance, and scalability across various components, including model architecture, communication systems, runtime, and hardware topology.This position is pivotal to our mission — enhancing communication performance, distributed reliability, and cross-layer optimization for extensive training workloads.The MissionWe seek a highly skilled engineer to collaboratively design and optimize the communication stack for large-scale distributed training, with a focus on hybrid parallelism and Mixture-of-Experts (MoE) workloads. This is a systems-level engineering role centered on performance enhancement, distributed debugging, and communication-runtime co-design.· Design and optimize expert-parallel and hybrid-parallel communication patterns· Drive high-performance hierarchical collectives for MoE workloads· Co-design runtime orchestration with communication topology awareness· Mitigate tail latency and enhance determinism across thousands of GPUs· Architect fault-tolerant distributed execution that withstands real-world cluster failuresCore Technical Scope· Communication-compute overlap and topology-aware collective optimization· In-depth debugging of NCCL, RDMA, and custom communication layers· Implementing hybrid expert parallel strategies in modern large-scale MoE systems· Developing elastic and resilient distributed job orchestration concepts· Conducting congestion analysis and routing optimization across InfiniBand/RoCE fabrics· Executing microbenchmarking and performance modeling for communication-intensive workloadsExpected Technical Depth· Expertise in hybrid expert parallel communication strategies
Are you an innovative researcher who thrives on transforming concepts into real-world solutions? Are you passionate about tackling the most pressing challenges in distributed systems, including AI at the edge, cybersecurity, and software-defined networking for autonomous machines? If you envision yourself at the crossroads of product development and cutting-edge technology, this position is perfect for you! Joining our elite research team, recognized among the top 1% nationally for research commercialization success, you will take on the role of Distributed Systems Research Engineer. Your work will not languish in the lab; instead, you will be pivotal in enhancing our standing as the leader in software infrastructure, contributing to advancements in autonomous vehicles, medical robotics, and smart power systems. Engage in groundbreaking research within distributed systems, leveraging a robust, data-centric approach while addressing some of the most complex challenges from communication to decision-making in intelligent machines.
At Cerebras Systems, we are pioneering the future of artificial intelligence with the development of the world's largest AI chip, which is an astonishing 56 times larger than traditional GPUs. Our innovative wafer-scale architecture combines the computational power of numerous GPUs into a single chip, simplifying programming and enhancing efficiency. This unique approach enables us to achieve unparalleled training and inference speeds, empowering machine learning practitioners to run extensive ML applications seamlessly, without the complexities of juggling multiple GPUs or TPUs.Our clientele includes leading model labs, global corporations, and groundbreaking AI-focused startups. Notably, OpenAI has recently partnered with Cerebras to harness 750 megawatts of scale, revolutionizing critical workloads with ultra-fast inference capabilities.Thanks to our cutting-edge wafer-scale technology, Cerebras Inference delivers the fastest Generative AI inference solutions available, exceeding GPU-based hyperscale cloud services by over ten times. This significant leap in speed is revolutionizing user interactions with AI applications, facilitating real-time adjustments and enhancing intelligence through advanced computational capabilities.About The RoleAs the security lead for Cerebras's AI cluster product, you will be at the forefront of ensuring the security of our large-scale AI clusters, which consist of hundreds of wafer-scale accelerator systems, thousands of high-performance servers, and numerous networking ports, including switches. This will also involve managing network-attached storage within a vast data center.Your primary responsibility will be to implement security measures based on established best practices and first principles, ensuring the protection of Cerebras's extensive AI clusters. These clusters comprise intricate hardware components, networking systems, and a fully integrated cluster management software stack that ranges from bare-metal deployments to sophisticated management systems that enable multi-tenant training and inference services across these expansive clusters.You will focus on guaranteeing end-to-end security and privacy for various cluster applications, developing security engineering solutions incorporating robust network access controls, user access management, and an exceptional multi-tenancy framework.
At GFiber, we are passionate about the transformative power of high-quality internet. We believe that it drives innovation, strengthens communities, and enables extraordinary achievements. As we continue our mission to provide better internet, we are expanding our team to include dedicated individuals who aspire to make a difference in the world.GFiber, a proud member of the Alphabet family, delivers cutting-edge internet services through Google Fiber and Google Fiber Webpass to homes and businesses across the U.S. With our ongoing expansion, we are connecting more cities and individuals to exceptional internet experiences.The application window will be open until at least April 17, 2026. This opportunity will remain available based on business needs, which may be before or after the specified date.This role is not eligible for immigration sponsorship. Our Information Systems and Technology team is vital in enhancing productivity and efficiency across all departments at GFiber. We are committed to delivering outstanding support, fostering strong partnerships, and ensuring seamless project execution. Together, we innovate to empower our colleagues and advance our organization.Role OverviewAs a Senior IT Systems Engineer, you will design and maintain the digital workspace that our employees depend on daily. This role encompasses the complete lifecycle of corporate applications—from end-to-end deployment and configuration to automated patching and long-term administration. You will oversee our Google Workspace environment and support essential office technologies, including multi-vendor networking, conferencing, and advanced printing solutions. With your profound expertise in software application engineering, you will also maintain strong full device management practices to ensure secure and efficient delivery across macOS, Windows, and ChromeOS platforms.Your Responsibilities:Manage the enrollment, security, and compliance of macOS, Windows, and ChromeOS devices utilizing Microsoft Intune, Google Workspace, and Kandji/Iru to ensure a stable platform for application delivery.Architect comprehensive deployment, configuration, and proactive patching strategies for the corporate productivity suite, leading the rollout of new enterprise solutions from initial setup to ongoing maintenance and optimization.Deploy and provide advanced support for office technologies and equipment.
CoreWeave is The Essential Cloud for AI™. Designed by innovators for innovators, CoreWeave provides a robust platform of advanced technology and exceptional teams, empowering organizations to confidently build and scale AI solutions. As a trusted partner for leading AI labs, startups, and global enterprises, CoreWeave integrates unmatched infrastructure performance with extensive technical expertise, driving innovation and transforming compute capabilities into actionable results. Since its inception in 2017, CoreWeave has rapidly advanced, becoming a publicly traded company (Nasdaq: CRWV) in March 2025. Explore more at www.coreweave.com.What You’ll Do:Description of the TeamThe IT Business Systems (GTM) team at CoreWeave is responsible for the application stack that facilitates the entire customer journey—from lead generation to quoting and cash collection. This team designs and manages systems that underpin a rapidly growing AI cloud business, collaborating closely with Revenue Operations, Marketing Operations, Finance, Billing, and Customer Experience to deliver scalable, compliant, and automated workflows.About the role:As a Senior Salesforce Engineer, you will take on a pivotal role as a hands-on technical leader, tasked with designing, building, and operating Salesforce solutions across Sales Cloud, Revenue Cloud/CPQ, and GTM integrations. You will tackle high-impact projects including post-acquisition system integrations, the implementation of a new usage-based billing platform, and enhancements within quoting, forecasting, and revenue operations. This role demands deep expertise in CPQ workflows while also overseeing broader platform architecture and integrations with finance and GTM systems. You will work cross-functionally with GTM, Finance, and IT leadership to create scalable, dependable solutions that position CoreWeave for continued success.
CoreWeave seeks a Senior Business Systems Engineer to focus on Data Center Systems II. This role can be based in Livingston, NJ, Bellevue, WA, or Sunnyvale, CA. Key responsibilities Partner with teams throughout the company to design and implement business systems that enhance data center operations. Use extensive experience to improve operational efficiency and address ongoing infrastructure needs. Offer ideas and solutions that help keep systems reliable and flexible as requirements change. Locations Livingston, NJ Bellevue, WA Sunnyvale, CA
About the Institute of Foundation ModelsWe are an innovative research laboratory focused on the creation, comprehension, application, and risk management of foundation models. Our mission is to propel research forward, cultivate the next generation of AI innovators, and contribute significantly to a knowledge-driven economy.Joining our team presents a unique opportunity to engage in the core of advanced foundation model training, collaborating with leading researchers, data scientists, and engineers as we address the most pivotal and influential challenges in AI advancement. Your work will involve the creation of groundbreaking AI solutions with the potential to revolutionize entire industries. Employing strategic and innovative problem-solving skills will be crucial in establishing MBZUAI as a premier global center for high-performance computing in deep learning, fostering remarkable discoveries that inspire future AI trailblazers.
Sonsoft Inc.
Join our innovative team at Sonsoft Inc. as a Senior Systems Engineer specializing in Mobility (iOS). In this pivotal role, you will be responsible for designing, implementing, and optimizing mobile solutions that enhance user experience and performance. Collaborate with cross-functional teams to drive the development of cutting-edge iOS applications that meet the evolving needs of our clients.
Applied Intuition, Inc.
About Applied IntuitionFounded in 2017, Applied Intuition, Inc. is at the forefront of revolutionizing physical AI. With a valuation of $15 billion, this Silicon Valley powerhouse is dedicated to constructing the digital backbone required to infuse intelligence into every moving machine globally. We serve a diverse range of industries including automotive, defense, trucking, construction, mining, and agriculture through three pivotal domains: tools and infrastructure, operating systems, and autonomy. Our solutions are trusted by eighteen of the top twenty global automakers and the United States military alongside its allies. Headquartered in Sunnyvale, California, we also have offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.We uphold a culture of in-office collaboration, expecting our employees to primarily work from our Applied Intuition offices five days a week. That said, we embrace flexibility, empowering our staff to responsibly manage their schedules, which may include occasional remote work or adjusting hours for family commitments.About the RoleJoin our NextGen OS team, where we are committed to developing Applied Intuition's operating system (OS) stack for future vehicles and innovative products. This is a singular opportunity to contribute to the creation and advancement of a next-generation full-stack operating system.As a Software Engineer on this team, you will be responsible for designing, developing, and implementing fundamental OS components such as the kernel, system services, runtime, application frameworks, BSPs, and hardware abstraction layers. As one of the first hires, your contributions will play a crucial role in shaping both architectural and implementation decisions, directly influencing the future of our OS.Your Responsibilities at Applied Intuition:Define the overall architecture and roadmap of the operating system.Collaborate closely with existing open-source projects, become a committer, and submit RFCs.Write and review critical, performance-oriented code across core OS components.Lead...
Intuitive Surgical, Inc.
As a Senior Quality Systems Engineer specializing in Field Actions at Intuitive Surgical, you will play a pivotal role in ensuring the highest standards of quality and compliance in our field operations. You will leverage your expertise to lead quality initiatives, support field actions, and collaborate with cross-functional teams to enhance product performance and patient safety.
At DigitalFish, our goal is to empower organizations by delivering cutting-edge technologies that revolutionize digital media creation and consumption for millions of users.We collaborate with leading digital media companies to craft next-generation platforms and user experiences. Our esteemed clientele includes industry giants such as Apple, Google, Meta, Disney, DreamWorks, Activision, Technicolor, ESPN, LEGO, and NASA, among others.THE ROLEJoin our agile and innovative team as a Senior Systems Engineer, where you will be instrumental in pushing the boundaries of imaging technology for camera capture. Your responsibilities will include developing a comprehensive high-fidelity camera simulator that accurately reflects photo-realistic imagery. This role will require close collaboration with camera architects and cross-functional teams including systems, optics, algorithm development, and image quality, to successfully execute, validate, and enhance camera simulations for architectural validation and component selection.
Applied Intuition, Inc.
Join the Future of AI at Applied IntuitionAt Applied Intuition, we're driving the evolution of physical AI. Established in 2017 and currently valued at an impressive $15 billion, our Silicon Valley-based company is crafting the digital backbone essential for infusing intelligence into every moving machine worldwide. We cater to diverse sectors, including automotive, defense, trucking, construction, mining, and agriculture, focusing on tools and infrastructure, operating systems, and autonomy. Our solutions are trusted by 18 of the top 20 global automakers as well as the U.S. military and its allies to provide unparalleled physical intelligence. Our headquarters is in Sunnyvale, California, with additional offices across the globe including Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more about us at applied.co.We are an in-office company, expecting our employees to primarily work from their Applied Intuition office 5 days a week. However, we value flexibility and trust our employees to manage their schedules responsibly, which may include occasional remote work or adjustments to accommodate family commitments.Meet Our Systems Engineers!Our systems engineers define and shape the complexities of developing safe and reliable autonomous systems, engaging in every stage of the development process from concept to deployment. Hear from our engineers to discover their journey at Applied Intuition and what keeps them motivated.Role OverviewWe are seeking a skilled Systems Engineer with a robust background in robotics and software. This role entails creating and managing system and software requirements for a groundbreaking product at Applied Intuition. You will significantly influence the technical direction of this project, ensuring its success.
CoreWeave
About CoreWeave CoreWeave is The Essential Cloud for AI™. The company provides a platform of technology, tools, and teams that support innovators building and scaling AI solutions. CoreWeave's infrastructure is trusted by leading AI labs, startups, and enterprises for its performance and reliability. CoreWeave is publicly traded on Nasdaq (CRWV) as of March 2025. Learn more at www.coreweave.com. Role Overview: Senior Storage Engineer As part of the Storage Engine Team, the Senior Storage Engineer designs and develops managed storage products that meet the needs of demanding AI workloads. This role involves close collaboration with engineering teams across infrastructure, compute, and platform to ensure storage services are reliable, scalable, and high-performing. What You Will Do Design and build distributed storage solutions that support scaling for data-intensive AI workloads. Develop exabyte-scale, S3-compatible object storage and integrate dedicated storage clusters for a variety of customer environments. Apply technologies such as RDMA, GPU Direct Storage, and distributed filesystem protocols (NFS, FUSE) to improve storage performance and efficiency. Lead projects to strengthen the reliability, durability, security, and observability of the storage stack. Work with operations teams to monitor, troubleshoot, and refine storage systems in production settings. Create metrics and dashboards that track storage performance and health. Analyze telemetry and system data to identify ways to improve throughput, latency, and resilience. Collaborate with platform, product, and infrastructure teams to deliver seamless storage capabilities across the stack. Mentor other engineers and share knowledge on building distributed, high-performance storage systems. Locations Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA
Applied Intuition, Inc.
About Applied IntuitionApplied Intuition, Inc. is at the forefront of shaping the future of physical AI. Established in 2017 and currently valued at $15 billion, this Silicon Valley powerhouse is developing the vital digital infrastructure necessary to infuse intelligence into every moving machine globally. Our solutions cater to sectors including automotive, defense, trucking, construction, mining, and agriculture, focusing on three foundational pillars: tools and infrastructure, operating systems, and autonomy. Trusted by 18 of the top 20 global automakers and the U.S. military alongside its allies, our innovative offerings drive physical intelligence. Headquartered in Sunnyvale, California, we have a global presence with offices in major cities such as Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.Our company thrives on in-office collaboration, and we expect employees to primarily work from their Applied Intuition office five days a week. However, we appreciate the value of flexibility and trust our employees to manage their schedules responsibly, which may include occasional remote work or adjusted hours to accommodate personal commitments.About the RoleAs a Senior Business Systems Engineer at Applied Intuition, you will serve as the lead architect of our corporate data ecosystem. We are in search of a highly skilled, senior-level engineer to bridge the divide between our core business functions and the software that drives them. Your primary responsibility will be to design and implement a seamless data ecosystem that interconnects back-end finance and HR systems, providing a unified, real-time view of our global operations, thereby enabling data-driven strategic decisions at the highest levels. You will ensure that our systems—including Oracle NetSuite, Workday, and Pigment—are integrated into a robust, scalable architecture that supports rapid global growth.At Applied Intuition, You Will:Architect & Integrate: Design, build, and maintain mission-critical integrations between core enterprise applications to enhance operational efficiency.
Cerebras Systems
Cerebras Systems is at the forefront of AI innovation, creating the world's largest AI chip, which is 56 times larger than traditional GPUs. Our pioneering wafer-scale architecture delivers exceptional AI computational power equivalent to dozens of GPUs on a single chip, offering users unparalleled simplicity and efficiency. This unique approach enables us to provide industry-leading training and inference speeds, allowing machine learning practitioners to run extensive ML applications seamlessly without the complexities of managing multiple GPUs or TPUs.Our clientele includes renowned model labs, leading global enterprises, and innovative AI-first startups. Recently, OpenAI announced a multi-year collaboration with Cerebras, leveraging 750 megawatts of scale to revolutionize critical workloads with ultra-high-speed inference.Thanks to our cutting-edge wafer-scale technology, Cerebras Inference offers the fastest Generative AI inference solution globally, achieving speeds over 10 times faster than typical GPU-based hyperscale cloud services. This significant speed enhancement is reshaping the user experience in AI applications, enabling real-time iteration and enhancing intelligence through advanced computation.About The RoleAs a Senior Mechanical Engineer at Cerebras, you will spearhead the design of innovative mechanical systems for our next-generation wafer-scale engine. Your key responsibilities will encompass ensuring adherence to specifications, validating manufacturability, and delivering high-quality products in a dynamic environment, addressing some of the most intricate challenges in the rapidly advancing AI landscape.In this role, you will be instrumental in developing the mechanical infrastructure for Cerebras' custom hardware systems.Rapidly iterate on designs and analyses to inform high-level systems decisions and guide the overall product strategy.Provide extensive support for environmental and performance testing on hardware, validate analyses, and ensure compliance with design criteria.Take ownership of technical deliverables.Conduct first-article inspections and functional analyses, identifying and resolving issues as they arise.Collaborate closely with design, manufacturing, production, diagnostics, and embedded software engineering teams, contractors, and suppliers.Perform detailed structural analyses and simulations to optimize designs.
Mindlance
Join Mindlance as a System Engineer, where you will play a vital role in designing, implementing, and maintaining complex systems. Collaborate with cross-functional teams to ensure system reliability and performance. Utilize your technical expertise to troubleshoot and resolve issues swiftly, contributing to the overall success of our projects.
Primary Function of Position:Intuitive Surgical specializes in the design and manufacture of intricate mechanical systems utilized in surgical procedures. As a Senior Electro Mechanical Systems Engineer, you will play a pivotal role in the development, documentation, and validation of innovative product designs while enhancing existing designs to cater to customer needs. Your contributions in systems engineering and electro-mechanical design will be vital to a dynamic team focused on creating advanced equipment for minimally invasive robotic surgery. This role will emphasize engineering the architecture for next-generation instruments, where you will design and innovate sensors, transducers, and electro-mechanical components while considering electrical performance, noise levels, scalability, and cost efficiency. You will conceptualize and evaluate complex electromechanical assemblies, develop theoretical models, conduct empirical validations, and perform computer-assisted simulations.Roles & Responsibilities:• Design, develop, and integrate high-performance sensors and ultrasound transducers for imaging and novel applications.• Create new product architectures in alignment with functional and design specifications for next-generation instruments.• Analyze and conceptualize complex electromechanical assemblies, creating theoretical models and executing empirical confirmations through simulations.• Design and select sensor architectures suitable for prototyping and high-volume production.• Develop high-fidelity sensor systems to monitor variables such as temperature, pressure, and bio-impedance in real-time.• Possess a comprehensive understanding of various fabrication methods for custom-designed parts, including CNC milling, sheet metal, casting, plastic molding, and 3D printing.• Collaborate across multidisciplinary teams to define requirements for both new and existing designs.• Conduct analysis and testing of product designs.• Research and identify new vendors and processes for component manufacturing.• Develop and execute component verification and product validation testing.• Provide engineering support to Manufacturing, Materials, Test Engineering, and Service to ensure smooth design transfers.• Demonstrate proficiency with test and measurement equipment such as force and height gauges, oscilloscopes, and RF equipment.• Have knowledge of international safety standards (e.g., IEC 60601-1, IEC 60601-2-2).
Join CoreWeave as a Senior Software Engineer I specializing in inference, where you will spearhead architectural designs, elevate engineering standards, and significantly enhance latency, throughput, and reliability across various services. Collaborate closely with product, orchestration, and hardware teams to advance our Kubernetes-native inference platform, ensuring we achieve stringent P99 SLAs at scale.
Applied Intuition, Inc.
About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI technologies. Established in 2017 and currently valued at $15 billion, this Silicon Valley-based company is developing the essential digital infrastructure to integrate intelligence into every moving machine worldwide. Applied Intuition serves pivotal sectors including automotive, defense, trucking, construction, mining, and agriculture through three primary domains: tools and infrastructure, operating systems, and autonomy. The company’s innovative solutions are trusted by 18 of the top 20 global automotive manufacturers, as well as the U.S. military and allied forces. Headquartered in Sunnyvale, California, Applied Intuition also has offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.We operate with an in-office culture, expecting our employees to primarily work from the Applied Intuition office five days a week. However, we value flexibility and trust our team members to manage their schedules responsibly, which may include occasional remote work, starting the day with morning meetings from home, or leaving earlier for family commitments.About the RoleWe are seeking a skilled System Safety Engineer with a strong background in automotive systems and software. This role involves the development and management of functional safety requirements for the systems associated with a new Applied Intuition product.In this position, you will be responsible for:Process Development: Establish and implement an ISO 26262 compliant development process, including creating safety manuals, templates, and internal procedural guidelines while adapting to existing customer processes.Safety Lifecycle Management: Lead and execute comprehensive functional safety activities (ISO 26262) from item definition to safety validation for Advanced Driver Assistance Systems (ADAS) and autonomous features (Parts 2, 3, 4 of ISO 26262:2018).System Architecture: Utilize in-depth knowledge of system-level functional safety to design robust fault detection, mitigation strategies, and safe-state transitions.
Join intuitive as a Senior Systems Research Engineer, where you will play a pivotal role in advancing the fields of embodied AI and robotics. You will be responsible for designing and implementing innovative systems that integrate AI with robotic applications, contributing to groundbreaking projects that push the boundaries of technology.
Sign in to browse more jobs
Create account — see all 685 results

