High Performance Computing Software Engineer - Supercomputing
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
About Institute of Foundation Models
The Institute of Foundation Models is a research lab committed to building, understanding, and leveraging foundation models to enhance AI development. Our focus is on advancing research, nurturing future AI innovators, and fostering contributions to a knowledge-driven economy.
Similar jobs
Search for Senior Network Engineer Supercomputing
634 results
Join the Institute of Foundation ModelsAs a leading research laboratory, we are devoted to building, understanding, utilizing, and managing foundation models effectively. Our mission is to propel research forward, cultivate the future generation of AI innovators, and contribute significantly to a knowledge-driven economy.In this role, you will engage with cutting-edge foundation model training, collaborating with top-tier researchers, data scientists, and engineers to address the most crucial and impactful challenges in AI development. You will play a pivotal role in crafting revolutionary AI solutions capable of transforming entire industries. Your strategic and innovative problem-solving abilities will be vital in establishing MBZUAI as a global leader in high-performance computing for deep learning, fostering groundbreaking discoveries that will inspire the next wave of AI pioneers.Position OverviewAs a member of IFM’s Supercomputing team, you will be tasked with designing, optimizing, and maintaining high-performance, low-latency networking solutions that support some of the world’s largest GPU supercomputing clusters. You will work on both network software and systems that facilitate AI training and inference processes, utilizing state-of-the-art technologies such as NVIDIA’s RDMA-capable solutions, InfiniBand, RoCE, and GPUDirect RDMA. Our comprehensive product stack encompasses the entire lifecycle of network management—from metric gathering and configuration deployment to zero-touch provisioning, real-time monitoring, alerting, and auto-remediation. Additionally, you will be responsible for troubleshooting, diagnosing, and swiftly resolving any network-related issues in collaboration with cross-functional teams, ensuring optimal reliability and performance.
Institute of Foundation Models
Join Our Innovative Team at the Institute of Foundation ModelsAt IFM, we are pioneers in developing, understanding, and managing foundation models. Our mission is to advance research, cultivate the next generation of AI innovators, and contribute significantly to a knowledge-driven economy. As a member of our esteemed team, you will engage in the forefront of cutting-edge foundation model training, collaborating with top-tier researchers, data scientists, and engineers. Together, we will address the most significant and impactful challenges in AI development. You will play a crucial role in creating revolutionary AI solutions that have the potential to transform entire industries. Your strategic and innovative problem-solving abilities will be essential in establishing MBZUAI as a global leader in high-performance computing for deep learning, facilitating discoveries that will inspire future AI pioneers. The Role IFM is developing the foundational compute infrastructure that will drive future breakthroughs in AI and computational science. We are seeking a High Performance Computing Software Engineer to collaborate in designing, developing, and operating the software systems that manage our extensive AI workloads. In this position, you will work at the crossroads of high-performance computing and machine learning. You will be part of a dedicated team focused on creating the software stack that supports the training of advanced ML models using over 1000 GPUs, while ensuring our infrastructure remains robust, efficient, and user-friendly.
Institute of Foundation Models
About the Institute of Foundation ModelsThe Institute of Foundation Models (IFM) specializes in designing and operating large-scale GPU supercomputing systems aimed at training cutting-edge foundation models. Our philosophy places emphasis on the interdependence of performance, fault tolerance, and scalability across various components, including model architecture, communication systems, runtime, and hardware topology.This position is pivotal to our mission — enhancing communication performance, distributed reliability, and cross-layer optimization for extensive training workloads.The MissionWe seek a highly skilled engineer to collaboratively design and optimize the communication stack for large-scale distributed training, with a focus on hybrid parallelism and Mixture-of-Experts (MoE) workloads. This is a systems-level engineering role centered on performance enhancement, distributed debugging, and communication-runtime co-design.· Design and optimize expert-parallel and hybrid-parallel communication patterns· Drive high-performance hierarchical collectives for MoE workloads· Co-design runtime orchestration with communication topology awareness· Mitigate tail latency and enhance determinism across thousands of GPUs· Architect fault-tolerant distributed execution that withstands real-world cluster failuresCore Technical Scope· Communication-compute overlap and topology-aware collective optimization· In-depth debugging of NCCL, RDMA, and custom communication layers· Implementing hybrid expert parallel strategies in modern large-scale MoE systems· Developing elastic and resilient distributed job orchestration concepts· Conducting congestion analysis and routing optimization across InfiniBand/RoCE fabrics· Executing microbenchmarking and performance modeling for communication-intensive workloadsExpected Technical Depth· Expertise in hybrid expert parallel communication strategies
Cerebras Systems
Join Cerebras Systems as a Senior WAN Network Engineer, where you will play a crucial role in designing and optimizing our wide area network infrastructure. You will work alongside a talented team to ensure high availability and performance of network services.
Artech Information Systems LLC
Join our dynamic team as a Network Engineer I where you will be at the forefront of our networking solutions. In this role, you will assist in the design, implementation, and maintenance of network systems, while collaborating with experienced professionals in the field. This is an exceptional opportunity for individuals looking to grow their technical skills in a supportive environment.
Sonsoft Inc.
Join our dynamic team at Sonsoft Inc. as an SDN & Networking Engineer. In this role, you will have the opportunity to work on cutting-edge networking technologies and solutions that drive the future of connectivity. You will collaborate with cross-functional teams to design, implement, and optimize software-defined networking solutions that enhance performance and scalability.
Join Us in Transforming Cybersecurity!Illumio stands at the forefront of ransomware and breach containment, revolutionizing how organizations tackle cyber threats and bolster operational resilience. Leveraging the power of the Illumio AI Security Graph, our breach containment platform adeptly identifies and mitigates threats across hybrid multi-cloud environments, effectively halting the spread of attacks before they escalate into crises.As a recognized leader in the Forrester Wave™ for Microsegmentation, Illumio empowers Zero Trust, enhancing cyber resilience for the critical infrastructure, systems, and organizations that sustain our world.Our Vision:Our Engineering team is cultivated in a culture that champions visionary leadership, autonomy, and ownership, fostering a vibrant synergy that propels us forward in the dynamic realm of cybersecurity.When you become part of our team, you align with the leader in Zero Trust Segmentation. You will engage with a cutting-edge technology stack encompassing operating systems, distributed applications, and advanced UI/visualization tools.Together, we are charting the course for the future of cybersecurity, continually developing world-class products with a diverse team committed to innovation during a time of unprecedented cyber threats.Your Contributions:Develop groundbreaking methods for orchestrating Zero Trust segmentation down to the application and pod/container level, pinpointing and easily obstructing attack pathways within the container ecosystem.Utilize the latest technologies and C++ standards to build scalable solutions.Enhance your understanding of modern container platforms such as Kubernetes, Istio, OpenShift, AKS, EKS, GKE, and more.Take ownership of critical features and subsystems, dive deep into details, and advocate for your designs within the team.Deliver robust implementations that are elegant, simple, scalable, stable, secure, and maintainable, safeguarding the critical infrastructure of significant enterprises.Collaborate with field organizations and key customers to influence the development of this groundbreaking product.Your Skills:Bachelor’s degree in Computer Science or a related field; Master’s degree is a plus.8+ years of experience in building distributed and scalable software systems.Proficient in C++ programming (versions 11/14/17/20).
Join Us in Shaping Cybersecurity!At Illumio, we are at the forefront of ransomware and breach containment, transforming how organizations tackle cyberattacks and maintain operational resilience. Leveraging the Illumio AI Security Graph, our advanced breach containment platform effectively identifies and mitigates threats across hybrid multi-cloud environments, halting the progression of attacks before they escalate into crises.As a recognized leader in the Forrester Wave™ for Microsegmentation, Illumio empowers Zero Trust principles, reinforcing cyber resilience for the critical infrastructure, systems, and organizations that sustain our world.Our Engineering Vision:Our Engineering team thrives on a culture of visionary leadership, autonomy, and ownership, fostering a dynamic synergy that propels us in an ever-evolving cybersecurity landscape.By joining our team, you will become part of the leader in Zero Trust Segmentation. You will engage with a cutting-edge technology stack encompassing operating systems, distributed applications, and immersive UI/visualization tools.Together, we are defining the future of cybersecurity, continuing to build world-class products led by diverse perspectives and a commitment to innovation during a time of unprecedented cybersecurity challenges.Your Role:You will architect a groundbreaking approach to orchestrating Zero Trust segmentation down to the application and pod/container level, effortlessly identifying and blocking attack pathways within the container ecosystem.Utilize the latest technologies and C++ standards.Deepen your understanding of modern container platforms such as Kubernetes, Istio, OpenShift, AKS, EKS, and GKE.Take ownership of designing critical features and subsystems, meticulously refining all details and defending your design choices to your peers.Deliver robust implementations that are elegant, simple, scalable, stable, secure, and supportable, ensuring our product protects the critical infrastructure of large enterprises.Collaborate with the field organization and pivotal customers to shape this innovative product.Your Toolkit:Bachelor’s degree in Computer Science or a related field; a Master’s degree is a plus.5+ years of experience in building distributed and scalable software systems.Proficiency in C++ (versions 11/14/17/20).
LinkedIn Corporation
LinkedIn Corporation seeks a Staff Software Engineer specializing in Network Security and Automation for its Sunnyvale office. This role centers on improving network security and automating workflows to help keep systems secure and reliable. Key Responsibilities Develop and maintain network security protocols throughout LinkedIn's infrastructure. Automate important operational processes to support system reliability and efficiency. Apply technical knowledge to safeguard infrastructure and drive continuous improvements. Location This position is based in Sunnyvale.
Sonsoft Inc.
We are seeking a motivated Networking Engineer with expertise in C programming and knowledge of SIP/RTP protocols to join our dynamic team. In this role, you will be responsible for developing and maintaining networking solutions, optimizing performance, and ensuring effective communication across systems. Your skills will help drive our projects forward, and you will have the opportunity to work alongside experienced professionals in a collaborative environment.
Intuitive Surgical, Inc.
Join Intuitive Surgical as a Managing Network Principal, where your leadership will guide our network management team towards excellence in operational efficiency and innovation. In this pivotal role, you will oversee the development and implementation of network strategies, ensuring that our systems are robust, secure, and aligned with our organizational goals.
Sonsoft Inc.
Join Sonsoft Inc. as an SDN & Networking Consultant, where you will leverage your expertise in Software Defined Networking to deliver innovative solutions for our clients. In this role, you will collaborate with cross-functional teams to design, implement, and optimize networking solutions that meet the evolving demands of our clients.
Intuitive Surgical, Inc.
Role overview Intuitive Surgical, Inc. is hiring a Senior Analyst for Supply Chain Risk and Network Visibility in Sunnyvale. This role uses data analytics and strategic thinking to highlight risks within the supply chain and improve transparency across the network. The work directly supports operational efficiency and helps ensure reliable delivery of healthcare solutions. What you will do Review and interpret supply chain data to find vulnerabilities and possible disruptions Create and implement strategies that improve network visibility and strengthen resilience Work with teams from different functions to drive ongoing improvements in supply chain operations Take part in decisions that influence how efficiently and reliably products are delivered Requirements Background in supply chain analytics or risk management Strong skills in analysis and problem solving Interest in refining and optimizing supply chain processes
Space Exploration Technologies Corp.
At SpaceX, we are driven by the vision of making humanity an interplanetary species. Our passion for innovation fuels our mission to explore the cosmos, and we are actively advancing technologies that will enable human life on Mars.SOFTWARE ENGINEER FOR STARLINK NETWORKJoin our team at SpaceX, where we utilize our extensive expertise in aerospace to deploy Starlink—the most advanced broadband internet system globally. As the largest satellite constellation in existence, Starlink delivers fast, reliable internet access to millions of users worldwide. We design, build, test, and operate every component of the system—from thousands of satellites and consumer receivers to the underlying software that integrates it all. We are just beginning to realize Starlink’s potential to positively impact global communities, and we seek exceptional engineers to enhance its utility for individuals and businesses around the world.As a Software Engineer working on the Starlink program, you will tackle challenges that enhance our capacity to maximize the hardware we've deployed. Our mission is to deliver the best satellite internet experience to our customers, particularly in underserved communities, providing them with affordable, transformative broadband access.Our software engineers oversee the entire lifecycle of the software they create, including development, testing, and support. We expect our engineers to establish a robust feedback loop between software design and real-world performance. In this role, your contributions will have a significant and measurable impact on the world.
Intuitive Surgical, Inc.
Join our innovative team at Intuitive Surgical, Inc. as a Senior Mechanical Engineer, where your expertise will drive the development of cutting-edge robotic surgical systems. You will be instrumental in designing, analyzing, and testing mechanical components that enhance patient outcomes and surgical precision.
Illumio
Join Us in Shaping the Future!At Illumio, we are pioneers in ransomware and breach containment, transforming how organizations manage cyber threats and ensure operational resilience. Our advanced Illumio AI Security Graph empowers our breach containment platform to identify and contain threats across hybrid multi-cloud environments, effectively halting the spread of attacks before they escalate into crises.As acknowledged leaders in the Forrester Wave™ for Microsegmentation, we facilitate Zero Trust frameworks that enhance cyber resilience for critical infrastructures and organizations worldwide.Location: 5 days a week on-site at our Sunnyvale, CA Headquarters.Our Team's Vision:Our Engineering team thrives on a culture fueled by visionary leadership, autonomy, and a strong sense of ownership. This dynamic synergy propels us forward in the rapidly evolving realm of cybersecurity.When you join us, you become part of a leading force in Zero Trust Segmentation, engaging with a cutting-edge technology stack that spans operating systems, distributed applications, and immersive UI/visualization tools.We are committed to shaping the future of cybersecurity and building world-class products driven by diverse perspectives and a dedication to innovation in these unprecedented times of cyber threats.Your Impact:As the Senior Manager of Cloud Engineering, you will lead a talented team focused on developing scalable, distributed cloud services through containerized microservices within a multi-cloud environment. You will steer the design, development, and delivery of cloud solutions, ensuring high standards for automation, observability, and operational excellence.Lead and manage a cloud engineering team dedicated to developing distributed microservices for a multi-tenant, scalable platform.Supervise cloud service development, prioritizing performance, security, and reliability.Establish coding standards, engage in code and design reviews, and ensure high-quality automation and testing.Collaborate with Product Management to align development efforts with business objectives.Oversee the entire software development lifecycle, from requirements gathering to deployment and operations.Cultivate a culture of continuous learning and innovation, embracing best practices in cloud engineering.
Join SpaceX as a Senior RFIC Design Engineer in our Silicon Engineering team. In this pivotal role, you will be responsible for designing innovative RF integrated circuits that drive our next-generation space technologies. Collaborate with a team of experts to push the boundaries of technology while ensuring the highest standards of quality and performance.
Intuitive Surgical, Inc.
Join our innovative team at Intuitive Surgical, where we are redefining the landscape of surgical robotics. As a Senior Mechanical Engineer, you will play a pivotal role in the design, development, and enhancement of cutting-edge robotic systems that are transforming patient care. Your expertise will contribute to creating solutions that improve surgical precision and outcomes.
Intuitive Surgical, Inc.
About the Role Intuitive Surgical, Inc. is hiring a Senior Electrical Engineer in Sunnyvale. This role focuses on designing and developing electrical systems for advanced surgical technologies. The work supports the full product development cycle, from initial concept through to production release. What You Will Do Design and develop electrical systems for surgical products Work closely with teams across disciplines to move products from idea to manufacturing Support quality and performance standards throughout development
Ceribell
About CeribellCeribell is at the forefront of medical technology, dedicated to revolutionizing the diagnosis and management of patients with serious neurological conditions. Our innovative Ceribell System is a cutting-edge, point-of-care electroencephalography (EEG) platform that meets the critical needs of patients in acute care settings. Already in use at hundreds of community hospitals, large academic institutions, and major integrated delivery networks across the nation, our team shares a collective mission to enhance critical care with our rapid seizure detection technology. Join us in making a difference!Position Overview:We are seeking a talented Senior Software Engineer with a strong backend focus to join our dynamic team in developing the next generation of EEG web applications that cater to vital medical use cases. In this role, you will be instrumental in designing, maintaining, and enhancing the backend systems for our EEG Portal web application, which is essential for healthcare providers, researchers, and clinical teams to access, monitor, and analyze EEG data. You will collaborate closely with fellow engineers, product managers, and stakeholders to ensure that our backend systems are robust, secure, and scalable within a medical environment.Key Responsibilities:Backend Development & Maintenance:Design, develop, and maintain backend systems to support the EEG Portal application, ensuring dependable performance and adherence to healthcare standards.Implement new features and enhancements to meet clinical and research demands, prioritizing efficiency and scalability.Troubleshoot, debug, and optimize backend systems to guarantee maximum uptime and reliability for users.Database Management:Write optimized database queries and execute data migration strategies.Monitor and fine-tune database performance, including indexing, replication, and backup processes.API Development & Integration:Develop and maintain RESTful APIs that interact with the frontend and other systems.Ensure APIs are secure, well-documented, and capable of handling large volumes of sensitive medical data.Integrate third-party services and platforms as needed to enhance functionality.Ensure backend services comply with regulatory standards, including data encryption, authentication, and auditing.
Sign in to browse more jobs
Create account — see all 634 results

