company

Senior Network Engineer - Supercomputing

ifm-usSunnyvale, CA
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

Ideal candidates should possess strong analytical abilities, a background in network engineering, and a passion for cutting-edge technology. Familiarity with high-performance computing environments and experience in deploying complex networking solutions will be essential.

About the job

Join the Institute of Foundation Models
As a leading research laboratory, we are devoted to building, understanding, utilizing, and managing foundation models effectively. Our mission is to propel research forward, cultivate the future generation of AI innovators, and contribute significantly to a knowledge-driven economy.

In this role, you will engage with cutting-edge foundation model training, collaborating with top-tier researchers, data scientists, and engineers to address the most crucial and impactful challenges in AI development. You will play a pivotal role in crafting revolutionary AI solutions capable of transforming entire industries. Your strategic and innovative problem-solving abilities will be vital in establishing MBZUAI as a global leader in high-performance computing for deep learning, fostering groundbreaking discoveries that will inspire the next wave of AI pioneers.

Position Overview

As a member of IFM’s Supercomputing team, you will be tasked with designing, optimizing, and maintaining high-performance, low-latency networking solutions that support some of the world’s largest GPU supercomputing clusters. You will work on both network software and systems that facilitate AI training and inference processes, utilizing state-of-the-art technologies such as NVIDIA’s RDMA-capable solutions, InfiniBand, RoCE, and GPUDirect RDMA. Our comprehensive product stack encompasses the entire lifecycle of network management—from metric gathering and configuration deployment to zero-touch provisioning, real-time monitoring, alerting, and auto-remediation. Additionally, you will be responsible for troubleshooting, diagnosing, and swiftly resolving any network-related issues in collaboration with cross-functional teams, ensuring optimal reliability and performance.

About ifm-us

The Institute of Foundation Models is at the forefront of AI research, dedicated to advancing the development of foundational models while nurturing new talent in the AI field. We are committed to making transformative contributions to a knowledge-driven economy.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.