Principal Staff Hpc Network Engineer jobs in San Francisco – Browse 5,645 openings on RoboApply Jobs

Principal Staff Hpc Network Engineer jobs in San Francisco

Open roles matching “Principal Staff Hpc Network Engineer” with location signals for San Francisco. 5,645 active listings on RoboApply Jobs.

5,645 jobs found

1 - 20 of 5,645 Jobs
Apply
companysfcompute logo
Full-time|On-site|San Francisco, CA

At sfcompute, we are pioneering a transformative approach to GPU cluster financing, enabling the largest infrastructure build-out in history while effectively mitigating risk.In the ever-evolving landscape of GPU technology, securing financing for GPU clusters and the essential infrastructure they require involves inherent risks. Our innovative model ensures that developers can lease clusters through fixed-price long-term contracts, thus offloading risk to the customer while maintaining financial stability.As AI and computational demands grow, our mission is to democratize access to powerful computing resources. We aim to create a liquid market for GPU offtake, allowing startups and smaller enterprises to thrive without the burden of long-term contracts that aren't feasible for them.Role OverviewJoin our dynamic infrastructure team, responsible for architecting and deploying cutting-edge GPU clusters globally. You'll play a crucial role in maintaining operational excellence, engaging in on-call rotations, and driving automation to facilitate large-scale deployments. As a key member of our small but ambitious team, you will help shape our culture, mentor junior engineers, and learn directly from our customers.

Feb 25, 2026
Apply
companyCrusoe logo
Full-time|$193K/yr - $234K/yr|On-site|San Francisco, CA - US

At Crusoe, we're on a mission to revolutionize the way energy and intelligence coexist. Our vision is to develop a robust infrastructure that empowers individuals to innovate ambitiously with AI, all while embracing principles of sustainability and efficiency.Join us at the forefront of the AI revolution, where you'll leverage sustainable technology to drive groundbreaking advancements, make a significant impact, and collaborate with a team that's pioneering responsible cloud infrastructure.About the RoleThe Crusoe Cloud Network Deployment Engineering team seeks a dynamic and experienced professional to enhance our Network Engineering efforts. This team is integral to designing, constructing, and managing the global edge, backbone, and data center networks for High-Performance Compute (HPC) Clusters utilizing GPUs. The ideal candidate will be self-motivated, technically adept, and passionate about working with cutting-edge environmental technologies. Exceptional analytical and communication skills, along with a collaborative spirit, are essential.As a Network Engineer, you will play a pivotal role in expanding the Global Crusoe Network, focusing on deploying new data centers, Points of Presence (PoPs), and backbone infrastructure. This position offers a unique opportunity to gain valuable experience in large-scale network engineering involving edge, backbone, and HPC-based data center networking.This position is on-site in San Francisco, CA, or Sunnyvale, CA, and requires in-office presence.Key Responsibilities:Deploy, construct, and optimize the global Crusoe Energy Cloud network, including edge, backbone, data center, and public cloud connectivity.Collaborate with cross-functional teams, including Software Infrastructure and Product, to foster innovation and advancement within the Crusoe Energy Cloud network.Engage with external vendors and ISPs to evaluate and confirm device and carrier selection.Participate in a 24/7 On-call Support for the Crusoe Network.What You Bring:A minimum of 10 years of experience in building and operating network solutions at scale within a production environment.Deep understanding of network protocols such as TCP/IP, QoS, BGP, OSPF/IS-IS, EVPN, VXLAN, and MPLS technologies.

Oct 15, 2025
Apply
company
Full-time|On-site|San Francisco HQ

About Us:Point One Navigation is at the forefront of redefining precise location technology. Our vision is to create a comprehensive location platform that empowers innovation, enhances safety, and boosts efficiency across various sectors, including robotics and transportation. We pride ourselves on being a dynamic and collaborative team that excels at tackling intricate challenges swiftly and effectively.The Role:We are on the lookout for a talented Staff Engineer to enhance our Backend Services team. In this pivotal role, you will be instrumental in shaping the architecture of our systems and developing services that underpin our global corrections network and real-time location capabilities.Key Responsibilities:Design, develop, and sustain high-performance backend services and network systems using Go and Java.Create detailed systems design documentation, architectural decision records, and comprehensive technical specifications.Engage actively in the formulation of technical strategies, architectural discussions, and infrastructure roadmap planning.Enhance the performance and reliability of mission-critical, globally distributed systems.Collaborate with cross-functional teams to deliver integrated solutions for real-time data processing and corrections delivery.Assume ownership of significant backend infrastructure components, ensuring their design, deployment, and continuous monitoring.

Nov 5, 2025
Apply
companyCrusoe Technologies logo
Full-time|On-site|San Francisco, CA - US

Join Crusoe Technologies as a Principal Software Engineer specializing in Software-Defined Networking (SDN). In this pivotal role, you will lead the design and development of innovative networking solutions that leverage SDN technologies. You will work closely with cross-functional teams to enhance our networking capabilities and drive the future of our products.As a thought leader in SDN, you will be responsible for architecting scalable solutions, optimizing performance, and ensuring robust security across our network infrastructure. This is an exciting opportunity to impact the rapidly evolving tech landscape.

Mar 12, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamJoin the Fleet team at OpenAI, where we empower groundbreaking research and product innovation through our advanced computing infrastructure. We manage extensive systems across data centers, GPUs, and networking, ensuring optimal performance, high availability, and efficiency. Our work is crucial in enabling OpenAI’s models to function seamlessly at scale, supporting both our internal research endeavors and external products like ChatGPT. We are committed to prioritizing safety, reliability, and the ethical deployment of AI technology.About the RoleAs a Software Engineer on the Fleet High Performance Computing (HPC) team, you will play a vital role in ensuring the reliability and uptime of OpenAI’s compute fleet. Minimizing hardware failures is essential for smooth research training progress and uninterrupted services, as even minor hardware issues can lead to significant setbacks. With the rise of large supercomputers, the stakes in maintaining efficiency and stability have never been higher.At the cutting edge of technology, we often lead the charge in troubleshooting complex, state-of-the-art systems at scale. This is a unique opportunity for you to engage with groundbreaking technologies and create innovative solutions that enhance the health and efficiency of our supercomputing infrastructure.Our team fosters a culture of autonomy and ownership, enabling skilled engineers to drive meaningful change. In this role, you will focus on comprehensive system investigations and develop automated solutions to enhance our operations. We seek individuals who dive deep into challenges, conduct thorough investigations, and create scalable automation for detection and remediation.Key Responsibilities:Develop and maintain automation systems for provisioning and managing server fleets.Create tools to monitor server health, performance metrics, and lifecycle events.Collaborate effectively with teams across clusters, networking, and infrastructure.Work closely with external operators to maintain a high level of service quality.Identify and resolve performance bottlenecks and inefficiencies in the system.Continuously enhance automation processes to minimize manual intervention.You Will Excel in This Role if You Have:Experience in managing large-scale server environments.A blend of technical skills in systems programming and infrastructure management.Strong problem-solving abilities and a methodical approach to troubleshooting.Familiarity with high-performance computing technologies and tools.

Feb 5, 2026
Apply
companyLila Sciences logo
Full-time|On-site|Cambridge, MA USA; London, UK; San Francisco, CA USA

Join Lila Sciences as a Staff or Principal Engineer specializing in Technical Mitigations Research. This role offers an exciting opportunity to leverage your engineering expertise to develop innovative solutions in the field of technical mitigations.

Apr 7, 2026
Apply
companySciforium logo
Full-time|On-site|San Francisco

At Sciforium, we are at the forefront of AI infrastructure, pioneering advanced multimodal AI models and an innovative, high-efficiency serving platform. With substantial backing from AMD and a dedicated team of engineers, we are rapidly expanding our capabilities to support the next generation of frontier AI models and real-time applications.About the RoleWe are looking for a highly skilled Senior HPC & GPU Infrastructure Engineer who will be responsible for ensuring the health, reliability, and performance of our GPU compute cluster. As the primary custodian of our high-density accelerator environment, you will serve as the crucial link between hardware operations, distributed systems, and machine learning workflows. This position encompasses a range of responsibilities, from hands-on Linux systems engineering and GPU driver setup to maintaining the ML software stack (CUDA/ROCm, PyTorch, JAX, vLLM). If you are passionate about optimizing hardware performance, enjoy troubleshooting GPUs at scale, and aspire to create world-class AI infrastructure, we would love to hear from you.Your Responsibilities1. System Health & Reliability (SRE)On-Call Response: Be the primary responder for system outages, GPU failures, node crashes, and other cluster-wide incidents, ensuring rapid issue resolution to minimize downtime.Cluster Monitoring: Develop and maintain monitoring protocols for GPU health, thermal behavior, PCIe/NVLink topology issues, memory errors, and general system load.Vendor Liaison: Collaborate with data center personnel, hardware vendors, and on-site technicians for repairs, RMA processing, and physical maintenance of the cluster.2. Linux & Network AdministrationOS Management: Oversee the installation, patching, and maintenance of Linux distributions (Ubuntu / CentOS / RHEL), ensuring consistent configuration, kernel tuning, and automation for large node fleets.Security & Access Controls: Set up VPNs, iptables/firewalls, SSH hardening, and network routing to secure our computing infrastructure.Identity & Storage Management: Manage LDAP/FreeIPA/AD for user identity and administer distributed file systems like NFS, GPFS, or Lustre.3. GPU & ML Stack EngineeringDeployment & Bring-Up: Spearhead the deployment of new GPU nodes, including BIOS configuration and software integration to ensure optimal performance.

Jan 7, 2026
Apply
companyCrusoe logo
Full-time|On-site|San Francisco, CA - US

Join Crusoe as a Staff Network Architect, where you will play a pivotal role in designing and implementing robust network architectures that support our innovative solutions. As a key member of our engineering team, you will collaborate with cross-functional teams to ensure seamless connectivity and security for our infrastructure.

Mar 31, 2026
Apply
companyCrusoe logo
Full-time|On-site|San Francisco, CA - US

Join Crusoe as a Staff Software Engineer specializing in Networking. In this critical role, you will design and implement innovative software solutions that enhance our networking infrastructure. You will collaborate with cross-functional teams to optimize performance and reliability, ensuring that our services run efficiently and securely.

Mar 25, 2026
Apply
companysfcompute logo
Full-time|On-site|San Francisco, CA

At sfcompute, we are on a mission to revolutionize the infrastructure landscape by minimizing the risks associated with the largest build-outs in history.When financing GPU clusters and the data centers that support them, having a contract in place—what we call an "offtake"—is crucial. This ensures that customers have signed on to lease the cluster even before it’s constructed.The financing process for GPU clusters carries inherent risks due to thin margins and large volumes. Lenders often hesitate to take on the risk that developers may default on their loans, while developers are wary of being unable to sell their clusters. This dynamic leads to the necessity of transferring risk to customers via fixed-price, long-term contracts.If customer risk isn't effectively mitigated, a market bubble can form. Unlike traditional SaaS models, application layer companies engage in multi-year contracts for compute and inference while offering customers monthly subscriptions. A miscalculation in purchasing can spell disaster; a small change in revenue growth could lead to profits or bankruptcy. Imagine a world where companies could exit their contracts by selling them back to the market.As AI technology scales, compute power will increasingly only be available for those who can manage the associated risks. A small startup in a San Francisco Victorian house cannot feasibly commit to a 5-year, take-or-pay contract for $100 million supercomputers, but they might be able to purchase a month of liquidity that someone else has sold back.That’s the market we’re building: a liquid marketplace for GPU offtake.About the RoleAs part of our infrastructure team, you will help design and deploy some of the most powerful GPU clusters in existence, with even smaller clusters today having ranked in the TOP500 five years ago. Your responsibilities will include participating in on-call rotations, deploying new environments, troubleshooting issues, and embracing automation to facilitate large-scale deployments. As a member of a small but dynamic team, you'll have the opportunity to significantly influence our company culture, mentor junior engineers, and engage directly with our customers.

Feb 25, 2026
Apply
companySonsoft Inc. logo
Full-time|On-site|San Francisco

Join our dynamic team at Sonsoft Inc. as a Principal Consultant - Network Architect. In this pivotal role, you will leverage your expertise to design and implement robust network architectures that meet the evolving needs of our clients. You will collaborate with cross-functional teams to deliver innovative solutions that enhance network performance and security.

Nov 2, 2016
Apply
companyCrusoe logo
Full-time|On-site|San Francisco, CA - US

Join Crusoe as a Staff Network Deployment Engineer in our Lab division. In this pivotal role, you will spearhead the deployment and optimization of our cutting-edge networking technologies. Your expertise will be crucial in ensuring that our systems are robust, efficient, and ready to meet the demands of our innovative projects.

Mar 31, 2026
Apply
companyCity and County of San Francisco logo
Full-time|On-site|San Francisco

Join the City and County of San Francisco as a Principal Information Systems Engineer specializing in networks. In this pivotal role, you will oversee the design, implementation, and optimization of advanced network systems that serve multiple departments across the city. You will collaborate with a diverse team to ensure the reliability, performance, and security of our network infrastructure.Your expertise will be instrumental in guiding projects that enhance the city's technological capabilities and improve service delivery to our residents.

Nov 6, 2023
Apply
companyCloudflare, Inc. logo
Network Engineer

Cloudflare, Inc.

Full-time|On-site|In-Office

Join Cloudflare as a Network Engineer, where you'll play a vital role in building and maintaining our global network infrastructure. This position offers a unique opportunity to work with cutting-edge technologies and collaborate with talented engineers to ensure the performance and reliability of our services.

Mar 12, 2026
Apply
company
Full-time|On-site|San Francisco Bay Area

ABOUT RETELL AIAt Retell AI, we are revolutionizing the call center experience using cutting-edge voice AI technology. In just 18 months since our inception, thousands of companies have leveraged our AI voice agents to streamline sales, support, and logistics operations that previously required extensive human teams. Supported by prominent investors such as Y Combinator and Alt Capital, we have grown from $5M to an impressive $36M ARR with a dedicated team of 20.Our ambitious vision for 2026 is to create a state-of-the-art customer experience platform where entire contact centers are driven by AI. Rather than relying on basic automation that necessitates constant human oversight, we are developing intelligent AI “workers” to function as frontline agents, QA analysts, and managers—constantly executing, monitoring, and enhancing customer interactions.As we rapidly expand, we seek passionate innovators eager to solve complex technical challenges, move swiftly, and make a meaningful impact at one of the fastest-growing voice AI startups. Join us in building the future!Ranked among the top 50 AI applications in the a16z list: https://tinyurl.com/5853dt2xRanked #4 on Brex's Fast-Growing Software Vendors of 2025: https://www.brex.com/journal/brex-benchmark-december-2025One of the top startups on the Leana I leaderboard: https://leanaileaderboard.com/THE ROLEWe are in search of a Principal/Staff Engineer to spearhead the technical direction of our core platform. This is an individual contributor role designed for someone who excels in uncertainty, acts swiftly, and elevates the standards of those around them.You will engage with various systems, infrastructure, and product surfaces while collaborating closely with engineering teams, product managers, and leadership to scale successful initiatives and innovate for the future.This role is not about merely addressing tickets; you will identify challenges, engineer solutions, and deliver impactful results.KEY RESPONSIBILITIESLead the design and evolution of our core platform and systems architecture.Oversee complex technical projects from inception to production.Make strategic technical decisions that optimize for speed, reliability, and scalability.Collaborate across teams to facilitate knowledge sharing and best practices.

Feb 8, 2026
Apply
companyNetwork Right logo
Full-time|On-site|San Francisco Office

Join Our Mission:At Network Right, we are dedicated to revolutionizing the IT landscape by crafting tailored solutions that resonate with our clients' unique needs. Our mission is to humanize technology, bridging the gap between IT services and the everyday experiences of individuals within businesses. We strive to enhance employee productivity and satisfaction through innovative technological solutions.Your Role:As a Network Engineer, you will be the go-to expert for all networking-related services within our organization. Your responsibilities will include designing, implementing, and maintaining robust network environments for our clients. From configuring firewalls to optimizing multi-site deployments, you will ensure that our clients’ networks are secure, scalable, and efficient.You will also act as the primary escalation point for complex network challenges, collaborating closely on new deployments and system upgrades. This position combines hands-on technical execution with strategic oversight, allowing you to shape the network standards of our organization while fostering strong relationships with both clients and internal teams.Key Responsibilities:Serve as the subject matter expert in networking for Service Delivery and Professional Services.Design, configure, and support client network environments, overseeing new implementations and system enhancements.Monitor network performance and ensure standardization across all subscription clients.Lead root cause analysis for intricate network issues as the highest escalation point.Collaborate on project scoping, execution, and documentation for network-centric initiatives.Propose and implement enhancements to improve network reliability, scalability, and security.

Feb 24, 2026
Apply
companyMeter logo
Full-time|$109K/yr - $186K/yr|On-site|San Francisco or New York

At Meter, we strive to revolutionize the way networks are deployed, ensuring they are built for optimal performance and scalability. As a Network Deployment Engineer, you will play a vital role in transforming customer requirements into robust wired and wireless network solutions using our proprietary tools. You will spearhead all technical choices throughout the design and deployment process while assisting our clients during the network activation and validation stages. Your technical expertise will be crucial in enhancing deployment efficiency and ensuring customer satisfaction, allowing Meter to meet the increasing demand for dependable internet infrastructure.Your contributions will include:Crafting high-performance networks: Utilize Meter’s specialized toolkit to design both wired and wireless networks that are both resilient and efficient.Driving key technical decisions: Oversee technical aspects of network design and deployment until successful network launch and validation.Facilitating network activation: Provide troubleshooting and support for essential devices and network features during the activation process to guarantee a seamless customer experience.Assisting with on-site installations: Travel to complex customer locations as necessary to offer installation and validation support, ensuring successful deployments.Collaborating across teams: Work alongside Deployment Project Managers, Sales Engineers, and Product Engineering teams to innovate the next generation of Meter’s deployment and networking solutions.Why Choose Meter?The internet powers our world—every email, purchase, and video call relies on effective network communication. Yet, traditional networks remain outdated, fragile, and challenging to set up within enterprise environments.Meter was founded to create superior networks. Our journey began with designing and building our own enterprise hardware, user-friendly software, and streamlined operations to achieve outstanding customer outcomes. Today, we deploy these networks on a large scale, supporting prominent organizations such as Bridgewater, Lyft, and Reddit to keep their teams connected and productive across numerous locations.Our vision at Meter is straightforward: we anticipate a future where internet usage will only increase. We are confident that our comprehensive networking stack will empower businesses to operate as seamlessly and reliably as modern utilities.

Apr 11, 2025
Apply
companyStitch Fix, Inc. logo
Senior Network Engineer

Stitch Fix, Inc.

Full-time|$87.8K/yr - $146K/yr|Remote|Remote, USA

About Stitch Fix, Inc. Stitch Fix (NASDAQ: SFIX) is a premier online personal styling service that helps individuals discover styles they will adore, ensuring a perfect fit so they always look - and feel - their absolute best. Dressing can be deeply personal, yet finding clothing that fits beautifully can be quite challenging. Stitch Fix addresses this dilemma by combining expert stylists with state-of-the-art AI and recommendation algorithms. We utilize a curated selection of exclusive and national brands to cater to each client's unique preferences and needs, making it effortless for clients to express their personal style without the hassle of spending hours in stores or browsing through countless online options. Founded in 2011, Stitch Fix is based in San Francisco.About the TeamThe Stitch Fix Enterprise IT Service Delivery team is dedicated to fostering a fulfilling and inspiring work environment for everyone who contributes to our client experience – from our warehouse team to technical experts, merchants, and stylists. We take pride in cultivating a vibrant and collaborative atmosphere where we tackle challenges together. We are in search of a proactive networking expert who is enthusiastic about securely managing and connecting physical and logical networks across our enterprise. If you are intelligent, compassionate, and driven by challenges, we welcome you to join our team.About the RoleIn the role of Senior Network Engineer at Stitch Fix, you will be instrumental in transforming our expanding IT team while also enhancing your own skills and providing an exceptional experience for your colleagues. Your responsibilities will include supporting daily network operations across our diverse physical locations, developing comprehensive documentation, and advocating for best practices utilizing IT service management (ITSM) or similar methodologies. Your role will involve assisting our downtown San Francisco headquarters, as well as our remote offices, distribution centers, engineers, and stylists. At Stitch Fix, you will flourish in a Cisco Meraki network environment, where your expertise will be vital in driving operational excellence and innovation.Ideal candidates should be available to work from 6AM to 3PM EST during the warehouse business hours and should ideally reside near our warehouse locations in Lithia Springs, GA, Atlanta, or Plainfield, IN, or surrounding areas.

Dec 31, 2025
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

Join OpenAI as an Optical Network Engineer, where you will play a pivotal role in advancing our cutting-edge technologies. We seek a skilled engineer to design, implement, and optimize optical networks, ensuring robust performance and scalability. This position offers an exciting opportunity to collaborate with a dynamic team and contribute to pioneering innovations in the field.

Mar 18, 2026
Apply
companyUnity Technologies logo
Senior Networking Engineer

Unity Technologies

Full-time|$115.4K/yr - $173K/yr|On-site|San Francisco, CA, USA

Join Our Team!The team that pioneered Unity's integration with visionOS and facilitated real-time interactions through Play-to-Device is expanding. We are gearing up for an exciting challenge: adapting PolySpatial to stream Unity content into various game engines and 3D environments—across processes and networks.We seek visionary engineers ready to redefine how different real-time 3D runtimes communicate and render under real-world constraints. Your work will blend core engine technology, high-performance networking, distributed systems, and advanced graphics. This is your chance to establish the underlying infrastructure for the next wave of interconnected gaming and 3D ecosystems.In this hands-on role, you will ensure a seamless connection between engines, making it fast, reliable, and transparent. You will address complex challenges, such as latency compensation and bandwidth optimization, ensuring that Unity content feels native even when streamed. Your responsibilities will include updating positions, synchronizing physics, and maintaining state consistency. If you are passionate about building multiplayer systems and tackling tricky desynchronization issues through packet captures, this position will keep you at the forefront of technology and close to the code.

Mar 16, 2026

Sign in to browse more jobs

Create account — see all 5,645 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.