Software Engineer Frontier Systems Power Management jobs in San Francisco – Browse 8,510 openings on RoboApply Jobs

Software Engineer Frontier Systems Power Management jobs in San Francisco

Open roles matching “Software Engineer Frontier Systems Power Management” with location signals for San Francisco. 8,510 active listings on RoboApply Jobs.

8,510 jobs found

1 - 20 of 8,510 Jobs
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamThe Frontier Systems team at OpenAI is at the forefront of technology, responsible for creating, deploying, and maintaining some of the world's largest supercomputers. These supercomputers are pivotal for training our most advanced AI models, pushing the boundaries of innovation.We transform sophisticated data center designs into operational systems and develop the software infrastructure necessary for extensive frontier model training. Our goal is to ensure these hyperscale supercomputers operate reliably and efficiently, supporting groundbreaking AI research.About the RoleAs a key member of the Frontier Systems team, you will be instrumental in designing the critical infrastructure that ensures our supercomputers function seamlessly for pioneering AI research. In this role, you'll address system-level challenges and implement automation solutions that minimize disruptions during large-scale training processes.Your responsibilities will encompass end-to-end ownership of your projects, allowing you to make significant contributions to our mission. This position is ideal for individuals who excel in diagnosing complex system issues and crafting automation strategies to proactively resolve problems across a vast network of machines.Your Responsibilities Include:Enhancing system health checks to maintain the stability of our hyperscale supercomputers during model training.Conducting in-depth investigations into hardware failures and system-level bugs to uncover root causes.Developing automation tools that monitor and resolve issues across thousands of systems, enabling uninterrupted research progress.You May Be a Great Fit If You Possess:7+ years of hands-on experience in software engineering.Strong proficiency in Python and shell scripting.Expertise in analyzing complex data sets using SQL, PromQL, Pandas, or other relevant tools.Experience in creating reproducible analyses.A solid balance of skills in both building and operationalizing systems.Prior experience with hardware is not a prerequisite for this role.Preferred Qualifications:Familiarity with the intricacies of hardware components, protocols, and Linux tools (e.g., PCIe, Infiniband, networking, power management, kernel performance tuning).Experience with system optimization and performance tuning.

May 9, 2025
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamJoin the innovative Frontier Systems team at OpenAI, where we develop, deploy, and maintain some of the world’s largest supercomputers used for pioneering model training. Our expertise in transforming data center designs into fully operational systems enables us to build the necessary software to facilitate expansive frontier model trainings.Our mission is to establish, stabilize, and ensure the dependability and efficiency of these hyperscale supercomputers throughout the training of our advanced models.About This PositionAs a Software Engineer on the Frontier Systems team with a focus on power management, you will play a pivotal role in enhancing our groundbreaking research capabilities. Given the significant power demands of large-scale supercomputers, your expertise will be essential in optimizing power management to maximize computational efficiency. This role is vital for maintaining smooth operations within our cutting-edge research supercomputing framework, ensuring both reliability and grid-level power consistency.Our team fosters an environment that empowers talented engineers with substantial autonomy and ownership, allowing for impactful contributions. You will be challenged to conduct thorough system-level investigations and develop automated solutions, tackling complex issues with depth and precision while creating scalable automation for detection and remediation.Your Responsibilities Will Include:Design and implement both system-level and software-level solutions to optimize power consumption in large-scale supercomputers, ensuring efficient and reliable operations.Develop automation tools to monitor power consumption patterns during training workloads and create algorithms to stabilize these fluctuations, safeguarding grid reliability.Collaborate with researchers and engineers to create tools for real-time monitoring, detection, and resolution of power-related hardware and system issues.Work cross-functionally to translate complex electrical system requirements into executable code, driving ongoing enhancements in our power management strategies.Lead the creation of power throttling mechanisms at the IT system level, dynamically adjusting power usage based on workload demands and infrastructure constraints.Partner with hardware design teams to integrate system-level power control requirements into hardware design, ensuring seamless collaboration between software-driven power management and hardware functionalities.

Oct 31, 2024
Apply
companymyhrllc logo
Full-time|On-site|San Francisco

Join our dynamic team as an Electrical Engineer specializing in Power and High-Voltage Systems!As a key contributor, you will:Design innovative high-voltage DC/DC converters, diode drivers, rectifiers, inverters, and complex power distribution/control circuits.Utilize advanced electrical simulations with SPICE or MATLAB/Simulink to enhance efficiency and reliability.Develop cutting-edge embedded and mixed-signal electronics, integrating microcontrollers and FPGAs for telemetry, protection, and fault management.Design and validate battery management systems and rapid-discharge power solutions, ensuring robust thermal management.Engineer durable wire harnesses and select high-performance connectors/cabling suitable for high-power environments.Lead hands-on prototyping, testing, and environmental/vibration/shock assessments.Conduct DFM reviews and work closely with suppliers to guarantee manufacturability and quality.Create comprehensive schematics, BOMs, and wiring documentation, ensuring compliance with safety and export regulations.Collaborate with mechanical, optical, and controls teams to achieve full system integration.

Feb 13, 2026
Apply
companyZipline logo
Full-time|$160K/yr - $225K/yr|On-site|South San Francisco, California, USA

Senior Electrical Engineer, Power Electronics South San Francisco, CA | Ground Systems | Full Time Are you ready to make a significant impact on the world? Zipline is on a mission to revolutionize the logistics of delivery. We’re dedicated to addressing the world’s most pressing access challenges by developing the first instantaneous delivery system that treats all humans equally, no matter their location. From facilitating the national blood delivery network in Rwanda and distributing COVID-19 vaccines in Ghana, to providing on-demand home delivery for Walmart, and enabling healthcare providers to bring services directly to homes across the U.S., we’re reshaping the landscape of logistics for businesses, governments, and consumers alike. Our technology is sophisticated, but the concept is straightforward: a teleportation service that delivers what you need, when you need it. Through our robotics and autonomy platforms, we aim to decarbonize delivery, alleviate road congestion, and enhance access to essential goods globally. About the Role Zipline is in search of a seasoned electrical engineer to spearhead the design and development of high-performance power electronics systems supporting our drone charging and docking infrastructure. As a member of our Ground Systems team, you will be instrumental in managing all ground-based infrastructures that facilitate our autonomous delivery drones, including automatic docking, battery charging, thermal management, ground communication, and preflight checks. This position is key to advancing our next generation of drone logistics by addressing intricate challenges in power distribution, conversion, battery charging, high-performance microprocessor integration, high-speed network communication, and overall system integration. Your extensive technical expertise in power electronics will play a crucial role in shaping Zipline’s global infrastructure as we expand. Additionally, you will collaborate closely with cross-functional engineering teams to oversee the electrical system development from concept to production. This hands-on engineering position is ideal for candidates who aspire to push the boundaries of hardware in practical applications.

Feb 4, 2026
Apply
companyNerdWallet logo
Full-time|Remote|NerdWallet US

At NerdWallet, we are committed to empowering individuals to make informed financial decisions. Our team comprises exceptional individuals who thrive in an inclusive, flexible, and candid environment. Whether you choose to work remotely or in the office, we prioritize your well-being, professional development, and the impact you can make. We believe that when one of us elevates our skills, the whole team benefits.As part of NerdWallet’s Platform team, you will oversee the systems that serve as the backbone of our consumer experience. This includes management of our centralized product data platform, partner ingestion pipelines, publishing and click-tracking infrastructure, GraphQL gateway operations, and our high-traffic, headless WordPress CMS. These platforms deliver precise, compliant, and high-performance product and content experiences to millions of users on both web and mobile platforms. We are searching for a Senior Engineering Manager to lead this team in modernizing legacy services into scalable and reliable systems while advancing our vision of a decoupled, adaptable platform that facilitates quicker publishing, enhanced observability, and future growth.In the role of Senior Engineering Manager for Platform Systems, you will guide and support a team of engineers in delivering high-quality, scalable, and secure software that aligns with NerdWallet’s product and business objectives. You will collaborate closely with Product Managers and other cross-functional partners to define the roadmap, prioritize tasks, and eliminate obstacles, while nurturing strong engineering practices and a culture of continuous improvement. Your responsibilities will include ensuring technical quality, team well-being, and daily operations, while mentoring engineers, making strategic technical decisions, and balancing immediate deliverables with long-term sustainability, compliance, and reliability.This position reports to the Director of Engineering.Opportunities for Impact:Lead, mentor, and develop a high-performing engineering team responsible for NerdWallet’s platform systems, including the Content Platform, CMS, and Product Data Platform.Collaborate with Product Managers and cross-functional teams to strategize, prioritize, and execute the product roadmap.Champion consistent adherence to software development best practices, including code quality, testing, documentation, and operational excellence.Influence and guide technical and architectural decisions to ensure solutions are scalable, secure, reliable, and compliant with regulatory standards.Balance immediate project needs with long-term project vision and maintainability.

Feb 24, 2026
Apply
companyAurelius Systems logo
Full-time|On-site|San Francisco

About Us:Aurelius Systems is an innovative, VC-backed defense technology startup focused on developing autonomous, edge-deployed robotics systems utilizing directed energy for counter-unmanned aerial systems (UAS).Our mission is to create laser weapon systems capable of neutralizing drones effectively.With a compact team of around ten engineers, ex-US military personnel, and subject matter experts, we are at the forefront of advancing America's capabilities in directed energy. Our aim is to deliver the first cost-effective, reliable, and robust laser weapon system.Inspired by the principles of Marcus Aurelius, we embrace a culture of relentless effort, with a commitment to delivering exceptional results without excuses. We operate in a flexible environment, leveraging our San Francisco lab and office, alongside our Detroit manufacturing hub, and conduct weekly field tests on our expansive 400-acre private range.If you're an engineer who prefers to witness your work in action rather than confined to a lab, we invite you to continue reading.The Opportunity:As the Power Electronics Engineer, you will lead the development of the power subsystem for our directed energy weapon systems.Your role entails being the technical lead on all aspects ranging from battery pack architecture to laser diode integration. You will not merely implement existing designs; instead, you will define the architecture, make critical decisions, and take ownership of your system during range tests.You will collaborate closely with the electrical engineer on the team, focusing on a depth of knowledge rather than breadth. Your responsibilities will include battery management system (BMS) architecture, high-voltage conversion, fault protection, and power delivery under various field conditions. When a power system issue arises during a range test, you will be the go-to expert, identifying the problem and implementing a solution promptly.Your Responsibilities:Design, develop, and validate battery management systems for high-voltage battery packs, ensuring safe charge/discharge cycles, cell balancing, thermal management, and fault protection.Engineer high-voltage DC/DC converters, rectifiers, inverters, and diode driver electronics for optimal laser power delivery.Create robust wire harnesses for high-power laser and motor systems, focusing on connector selection, routing, shielding, and environmental resilience.Engage in hands-on prototyping, assembly, and testing on our expansive range.Conduct rigorous electrical, environmental, vibration, shock, and robustness testing to ensure system reliability.

Feb 16, 2026
Apply
companyRedwood Materials logo
Full-time|$145.2K/yr - $196.8K/yr|On-site|San Francisco, California, United States

About Redwood MaterialsAt Redwood, we are revolutionizing the global battery supply chain by integrating recovery, reuse, and recycling—ensuring critical minerals remain in circulation and accelerating the energy transition. Established in 2017, we are pioneering low-cost, large-scale energy storage solutions and producing battery materials domestically in the U.S. for the first time, all sourced from existing batteries.Position: Power Systems Engineer - Energy StorageKey Responsibilities:Collaborate with product and controls design teams to translate power system requirements into specifications for site and inverter controllers, thereby influencing control architecture at both system and plant levels.Engage in the design and optimization of power systems for energy storage applications, ensuring efficiency and reliability.Analyze and improve existing power systems, focusing on performance and scalability.

Mar 12, 2026
Apply
companyCrusoe logo
Full-time|On-site|San Francisco, CA - US

Join Crusoe as a Principal Systems Software Engineer and play a vital role in revolutionizing the tech industry. You will lead the development of innovative software solutions that enhance our systems and platforms, contributing to the overall mission of providing efficient and sustainable computing resources. Your expertise will help shape the future of our software architecture and ensure seamless integration across various applications.

Feb 25, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About the TeamJoin the innovative Frontier Systems team at OpenAI, where we design, implement, and maintain the world's largest supercomputers, essential for advancing our most groundbreaking model training initiatives.We transform data center blueprints into operational systems while crafting the software necessary for executing large-scale frontier model trainings.Our mission is to establish, stabilize, and ensure the reliability and efficiency of these hyperscale supercomputers throughout the training of our frontier models.About the RoleWe are seeking passionate engineers to manage the next generation of compute clusters that underpin OpenAI’s frontier research.This position merges distributed systems engineering with practical infrastructure work across our expansive data centers. You will scale Kubernetes clusters to unprecedented levels, automate bare-metal setups, and create the software layer that simplifies the complexity of numerous nodes across various data centers.Your work will be at the crossroads of hardware and software, where speed and reliability are paramount. Be prepared to oversee dynamic operations, swiftly identify and resolve pressing issues, and constantly elevate the standards for automation and uptime.In this role, you will:Provision and scale extensive Kubernetes clusters, including automation for deployment, bootstrapping, and lifecycle managementCreate software abstractions that integrate multiple clusters and provide a cohesive interface for training workloadsOversee node deployment from bare metal to firmware upgrades, ensuring rapid, repeatable setups at scaleEnhance operational metrics by reducing cluster restart times (e.g., from hours to minutes) and expediting firmware and OS upgrade cyclesIntegrate networking and hardware health systems to ensure end-to-end reliability across servers, switches, and data center infrastructureDevelop monitoring and observability systems to identify issues early and maintain cluster stability under high loadsYou might thrive in this role if you:Have extensive experience operating or scaling Kubernetes clusters or similar container orchestration systems in high-growth or hyperscale environmentsPossess strong programming skills in languages relevant to cloud and infrastructure management

Nov 7, 2024
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our Team:Join the innovative Database Systems team at OpenAI, where we specialize in high-performance distributed databases. We are the architects behind Rockset, a cutting-edge real-time search, analytics, and vector database that powers all vector search and retrieval augmented generation (RAG) at OpenAI. Rockset underpins core functionalities across all OpenAI product lines and supports various critical internal applications.About the Role:We are in search of engineers who are passionate about distributed systems, performance optimization at a low level (with our core engine developed in C++), and constructing scalable database infrastructures from scratch. As a member of the Database Systems team, you will play a key role in enhancing the core database engine, making significant contributions to ingestion, query execution, indexing, and storage improvements. You will collaborate with multiple teams across OpenAI to unlock new product capabilities and ensure the reliability and scalability of our online database as usage expands exponentially.Your Responsibilities Will Include:Design, develop, and maintain high-performance distributed systems.Identify and address performance bottlenecks to elevate infrastructure capabilities.Define and guide the long-term technical vision and evolution of the system.Collaborate with product, engineering, and research teams to deliver robust and scalable infrastructure.Investigate complex production issues across the entire technology stack.Contribute to incident response, retrospective analyses, and establishing best practices for system reliability.You Will Excel In This Role If You:Possess substantial experience in building, scaling, and optimizing distributed systems.Exhibit a keen interest in database internals, storage engines, or low-latency query systems.Enjoy tackling complex performance challenges in high-throughput systems.Have experience managing and operating production clusters at scale (e.g., Kubernetes or similar orchestration tools).Approach scalability, correctness, and reliability with a rigorous mindset.Thrive in a fast-paced environment where you can make a significant impact.Qualifications:4+ years of relevant industry experience with a focus on distributed systems.Proficiency in C++ or similar low-level programming languages.Strong problem-solving skills and attention to detail.Experience with performance monitoring and optimization tools.Excellent collaboration and communication skills.

Jul 29, 2025
Apply
companyAfterQuery logo
Full-time|On-site|San Francisco

About AfterQuery AfterQuery partners with leading AI labs to advance training data and evaluation frameworks. The team builds high-signal datasets and runs thorough evaluations that go beyond standard benchmarks. As a post-Series A, early-stage company in San Francisco, AfterQuery gives each team member room to shape the future of AI models. Role Overview: Research Scientist - Frontier Data This role focuses on designing datasets and developing evaluation systems that influence how top AI models are trained and assessed. Working closely with research teams at major AI labs, the scientist explores new data collection techniques, investigates where models fall short, and sets up metrics to track progress. The work is hands-on and experimental, moving quickly from hypothesis to live testing and directly impacting large-scale model training. Key Responsibilities Design data slides and analyze data structures to uncover model weaknesses in areas like finance, software development, and enterprise operations. Build and refine evaluation rubrics and reward signals for RLHF and RLVR training approaches. Study annotator behavior and run experiments to improve model capabilities across different domains. Develop quantitative frameworks to measure dataset quality, diversity, and their effect on model alignment and performance. Work with research teams to turn training objectives into concrete data and evaluation needs. What We Look For Experience as an undergraduate or master’s research student (PhD not required). Background or internships with RL environments or AI safety and benchmarking organizations (e.g., METR, Artificial Analysis) is a strong plus. Genuine interest in how data structure, selection, and quality affect model outcomes. Demonstrated skill in designing experiments, acting quickly, and extracting insights from complex data. Comfort working across sectors such as finance, software engineering, and policy. Strong quantitative background and familiarity with LLM training pipelines, RLHF/RLVR methods, or evaluation frameworks. A hands-on mindset focused on building practical solutions.

Apr 14, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamAt OpenAI, we are dedicated to ensuring our innovative products are effectively monetized to meet the diverse needs of our customers. Our Financial Engineering team works closely with the Go-To-Market (GTM) and Finance departments to continuously adapt our billing architecture to align with our dynamic internal requirements.About the OpportunityWe are seeking a talented Engineering Manager to oversee and enhance the workflows that drive quoting, tracking, and fulfillment for all OpenAI sales. This role is pivotal in developing essential billing and invoicing capabilities while collaborating on customer-facing billing experiences to uphold financial integrity, ensure auditability, and provide a seamless onboarding and billing journey for enterprise clients.Key Responsibilities:Lead and mentor a team of engineers focused on automating order management, prioritizing reliability, accuracy, and a positive customer onboarding experience.Own the design and roadmap for order data flows into various downstream systems, including internal provisioning, billing, invoicing, and revenue management.Create and maintain resilient workflows that automate entitlements, provisioning, usage controls, SKU attribution, invoice generation, and revenue recognition—streamlining processes while maximizing accuracy and traceability.Enhance the accuracy and timeliness of provisioning, billing, and invoicing through automation, validation, and reconciliation, reducing manual intervention.Establish robust operational practices (observability, alerting, runbooks, on-call) to ensure system health with minimal human oversight.Collaborate extensively with Sales Operations, Finance, Accounting, Support, Product, Security, and Compliance teams to translate complex requirements into resilient, auditable workflows.Navigate ambiguous problem spaces and evolving product offerings, creating scalable frameworks and abstractions as OpenAI's commercial footprint grows.Uphold high engineering standards through technical direction, design reviews, mentoring, and fostering a culture of ownership and continuous improvement.Exhibit strong leadership by mentoring engineers, recruiting and retaining top talent, managing stakeholder expectations, and balancing customer needs with deliverable realities.You Will Excel in This Role If You:Possess a passion for leading engineering teams and driving process improvements.Have a proven track record of managing complex engineering projects and fostering collaboration across diverse teams.Enjoy tackling challenges with innovative solutions while maintaining a customer-centric approach.

Feb 5, 2026
Apply
companysfcompute logo
Full-time|On-site|San Francisco, CA

Join us at sfcompute, where we are revolutionizing the future by mitigating risks associated with the largest infrastructure development in history.As the demand for GPU clusters surges, financing these data centers and their supporting infrastructure has never been more critical. Our innovative approach ensures that financing is secured through long-term contracts, providing peace of mind to both lenders and developers.In the fast-paced world of AI and compute resources, we are creating a liquid market for GPU offtake, allowing even small startups to access high-end computing power without the burdens of traditional financing.About the RoleAs a Systems Software Engineer at sfcompute, you will be instrumental in developing a GPU market that brings the advanced software capabilities of hyperscalers to our innovative GPU neoclouds. Your responsibilities will encompass provisioning and monitoring bare metal servers with our virtualization orchestration software, as well as collaborating with our GPU marketplace to facilitate user configurations of VMs, networks, and storage.Key tasks include creating and maintaining a Linux OS image tailored for our tools, ensuring consistent deployment across nodes with specific data-center adjustments, and designing the API protocols and servers for user interaction.Our primary programming language is Rust, which enables us to write efficient code across all system layers, from web servers to kernel coordination. If you are familiar with memory-managed languages like C and possess experience in higher-level programming, we encourage you to apply.

Feb 27, 2026
Apply
companyScale AI logo
Full-time|$138K/yr - $259.4K/yr|On-site|San Francisco, CA; St. Louis, MO; New York, NY; Washington, DC

Scale AI is on the lookout for an exceptionally talented and driven Software Engineer, Frontier AI Infrastructure to become an integral part of our innovative Public Sector Engineering team. In this role, you will take charge of the model inference layer, enabling cutting-edge AI models, troubleshooting the latest AI tools, managing networking tasks, addressing latency issues, and monitoring pricing and usage metrics for AI models. You will spearhead technical discussions with cloud vendors and clients to fulfill critical contracts and resolve platform challenges. Additionally, you will collaborate closely with Product teams to anticipate feature requirements, transitioning from reactive 'infra-only debugging' to proactive integration testing.Your Responsibilities Include:Designing and implementing secure, scalable backend systems tailored for Public Sector clients, utilizing Scale's advanced cloud-native AI infrastructure.Owning services or systems while defining long-term health objectives and enhancing the health of related components.Redesigning the architecture to operate in compliant or restrictive environments, which entails creating swappable components (authentication, storage, logging) to adhere to government and security regulations without compromising product integrity.Collaborating with Product teams to develop integration tests that identify issues early, shifting focus from 'infra-only debugging' to preventing upstream failures.Actively participating in customer engagements, liaising with stakeholders to comprehend requirements and deliver innovative solutions.Contributing to the platform roadmap and product strategy for Scale AI's Public Sector division, playing a vital role in shaping the future trajectory of our offerings.

Mar 26, 2026
Apply
companyLumafield logo
Full-time|On-site|San Francisco, CA

About Lumafield: Established in 2019, Lumafield has pioneered the development of the world's first accessible X-Ray CT scanner specifically designed for engineers. Our intuitive scanner, combined with cloud-based software, empowers engineers to gain unparalleled insights into their projects at a remarkably affordable cost. Engineers face high-stakes decisions daily, necessitating tools that provide maximum visibility into their designs. By delivering exceptional product clarity and AI-enhanced tools that identify issues and produce quantitative insights, Lumafield is set to transform the creation, manufacturing, and application of complex products across various sectors. Our company thrives on impact and is dedicated to delivering the utmost value to our customers, ensuring their needs drive our development. Our talented team consists of leading researchers, industrial designers, PhD holders, innovators, and startup founders, all working collaboratively without egos. We proudly receive backing from prestigious venture capital firms, including Kleiner Perkins, Lux Capital, DCVC, and Spark Capital.Headquartered in Cambridge, MA, with an additional office in San Francisco, CA, we are excited to grow our team.About the Role: As a Senior Systems Software Engineer at Lumafield, you will be instrumental in developing the software that drives our cutting-edge, in-line manufacturing CT scanning products. You will engage with state-of-the-art X-ray physics, high-speed detectors, image processing, and embedded systems. Collaborating within a small team focused on our latest hardware, you will harness your expertise to maximize system performance and achieve outstanding results for our clients. This position is perfect for those eager to take ownership of embedded systems, firmware, and software design in an early-stage product environment. This role is based in our San Francisco, CA office, with occasional travel required to our Cambridge, MA office.

Mar 18, 2026
Apply
companyAurelius Systems logo
Full-time|On-site|San Francisco

About Us:Aurelius Systems is a venture capital-backed startup at the forefront of defense technology, specializing in the development of autonomous, edge-deployed robotic systems utilizing directed energy for counter-unmanned aerial systems (UAS).Our innovative approach involves creating laser systems designed to neutralize drones.With a dedicated team of approximately 10 engineers, former U.S. military personnel, and industry experts, we are committed to advancing America's capabilities in directed energy technology, delivering the first cost-effective and reliable laser weapon systems.Inspired by the philosophy of Marcus Aurelius, we emphasize consistent effort and accountability in our work, embodying a culture of high output without excuses. Following in the footsteps of pioneers like Henry Ford, we embrace innovation and action within our small but impactful team.In addition to our San Francisco headquarters, we are proud to operate a manufacturing hub in Detroit and conduct field tests weekly on our expansive private range.If you thrive on seeing your engineering contributions directly in action rather than being confined to a lab, we encourage you to explore this opportunity.The Position & Your Contribution:As a Robotics Software Systems Engineer, your primary responsibility will be to ensure that all subsystems function seamlessly and efficiently together.Our system comprises a complex array of subsystems including sensing, computer vision, machine learning inference, control systems, power management, and mechanical actuation. Achieving minimal processing time and inter-process latency is crucial for successfully targeting our nimble and evasive UAS.The key area we are looking to fill is real-time systems performance at the hardware interface. You should possess a deep understanding of how software execution impacts physical system behavior, how latency accumulates across CPU, GPU, memory, and I/O, and how bandwidth limitations influence sensor data processing. We need an engineer who is detail-oriented, considering microseconds, memory bandwidth, cache behavior, and system determinism.In our tight-knit team of around 10 engineers, you will have the opportunity to take ownership of systems that are field-tested. The success of our tests is binary—it's either effective or it isn't—and your role will involve iterative improvement based on real-world outcomes.Your Responsibilities:Manage the latency budget for the entire platform, from data sensing to actuation.Profile and mitigate latency across CPU, GPU, memory, and I/O interfaces.Develop and optimize kernels for high-throughput, low-latency operations.Adjust memory access patterns for optimal performance.

Mar 2, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamThe Platform Systems team at OpenAI is at the forefront of innovation, merging advanced AI technologies with large-scale distributed systems. We are tasked with creating the engineering and research infrastructure essential for training OpenAI's premier models on some of the most powerful, custom-built supercomputers globally.Our team is dedicated to developing the core software for model training, delving deep into the technological stack. This encompasses collective communication, compute efficiency, parallelism strategies, fault tolerance, failure detection, and observability. The systems we design are pivotal to enhancing OpenAI's research capabilities, facilitating reliable and efficient training at the leading edge of technology.We work in close partnership with researchers across the organization, continuously integrating insights from various OpenAI projects to advance our training platform.About the RoleAs a Software Engineer specializing in Platform Systems, you will architect and develop distributed systems that enhance visibility into large-scale training operations, ensuring their dependable operation at scale.Your responsibilities will include designing systems for failure detection, tracing, and observability that pinpoint slow or malfunctioning nodes, identify performance bottlenecks, and assist engineers in optimizing extensive distributed training tasks. This infrastructure is integral to the functionality of OpenAI's training stack and is continuously evolving to accommodate new use cases and increasingly intricate workloads.This position is central to our training infrastructure, merging systems engineering, performance analysis, and large-scale debugging.Key ResponsibilitiesDesign and develop distributed failure detection, tracing, and profiling systems tailored for large-scale AI training jobs.Create tools to identify slow, faulty, or errant nodes and deliver actionable insights into system behavior.Enhance observability, reliability, and performance across OpenAI's training platform.Troubleshoot and resolve issues within complex, high-throughput distributed systems.Collaborate effectively with systems, infrastructure, and research teams to advance platform capabilities.Adapt and expand failure detection and tracing systems to support new training paradigms and workloads.Ideal Candidate ProfilePossesses a deep passion for performance, stability, and observability in distributed systems.Demonstrates proficiency in systems engineering and performance analysis.Has experience in debugging high-throughput distributed systems.Exhibits strong collaboration skills with a track record of working with cross-functional teams.Shows adaptability and eagerness to embrace new technologies and methodologies.

Jan 23, 2026
Apply
companyOpenAI logo
Full-time|Hybrid|San Francisco

Location: San Francisco, CA (Hybrid: 4 days onsite/week). Relocation assistance available.About Our Team:At OpenAI, we are at the forefront of technology, creating foundational platform software that ensures our consumer products are reliable, secure, and high-performing. Our team collaborates across various system layers, working closely with engineering partners to deliver exceptional capabilities from initial concept to final launch.Role Overview:We are looking for a passionate Systems Software Engineer to lead the design, implementation, and debugging of critical platform components and the pipelines that build and update system images. Your focus will span across operating system layers, emphasizing performance optimization, security enhancements, and in-depth system debugging to deliver production-grade systems that exceed expectations.Key Responsibilities:Design and develop robust system-level components and services within both kernel and user spaces.Configure and maintain essential OS platform services (init, services, networking, security policies) and related tools.Build and manage image and update pipelines, ensuring their reliability, reproducibility, and rollback safety.Instrument system performance through profiling and tracing; enhance CPU, memory, I/O, and energy efficiency.Oversee platform observability and reliability, including logging, crash capture, watchdogs, and diagnostics.Collaborate with cross-functional teams to define interfaces and deliver comprehensive end-to-end features.Establish and promote strong engineering practices such as code reviews, continuous integration, reproducible builds, and effective release management.Work alongside external vendors to support builds and deployments.You Will Excel in This Role If You:Have successfully launched production systems software on modern operating systems.Possess proficiency in C/C++ and a scripting language, with a strong understanding of OS internals including concurrency, memory management, filesystems, networking, and power management.Demonstrate exceptional systems debugging skills utilizing debuggers, tracers, profilers, and logs across kernel/user-space boundaries.Comprehend the configuration of platform services and interfaces, effectively translating requirements into stable, well-documented APIs.Are knowledgeable about user-space foundations including service management, IPC, networking, packaging, and automation.Have experience collaborating with external partners to deliver high-quality software solutions.

Dec 16, 2025
Apply
companyAchira logo
Full-time|On-site|San Francisco Office

Why Join Achira?Become part of an exceptional team comprised of scientists, ML researchers, and engineers dedicated to transforming the landscape of drug discovery.Engage with cutting-edge machine learning infrastructure at an unprecedented scale, leveraging extensive computing resources, vast datasets, and ambitious goals.Take ownership of significant projects from conception through to architecture and deployment on large-scale infrastructures.Thrive in a culture that values thoroughness, speed, and a proactive, builder-oriented mindset.About the RoleAt Achira, we are developing state-of-the-art foundation models that address the most complex challenges in simulation for drug discovery and beyond. Our atomistic foundation simulation models (FSMs) serve as comprehensive representations of the physical microcosm, encompassing machine learning interaction potentials (MLIPs), neural network potentials (NNPs), and various generative model classes.We are looking for a Software Engineer who is enthusiastic about distributed computing and its applications in machine learning. You will play a pivotal role in designing and constructing the infrastructure for our ML data generation pipelines, model training, and fine-tuning workflows across large-scale distributed systems.Your expertise will be crucial in ensuring our compute clusters are efficient, observable, cost-effective, and dependable, enabling us to advance the frontiers of ML development. If you are passionate about distributed systems, performance optimization, and cloud cost efficiency, we encourage you to apply.You will be empowered to conceptualize and manage complex workloads across multiple vendors worldwide. Achira's mission revolves around computation, and providing seamless access to our uniquely tailored workloads at the lowest possible cost is critical to our success.

Oct 7, 2025
Apply
companySpecter logo
Full-time|On-site|San Francisco

Company Overview:Specter is revolutionizing how businesses perceive their physical environments by developing a software-defined control plane. Our mission is to enhance the security of American enterprises by providing them with comprehensive visibility over their physical assets.We are pioneering a connected hardware-software ecosystem that leverages multi-modal wireless mesh sensing technology, reducing the deployment costs and time for sensors by a factor of ten. Our platform aims to be the perception engine for a company’s physical presence, facilitating real-time visibility of perimeters and enabling autonomous operational management.Founded by passionate innovators from Anduril, Tesla, Uber, and the U.S. Special Forces, our co-founders, Xerxes and Philip, are dedicated to empowering our partners in the rapidly evolving landscape of physical AI and robotics.

Oct 3, 2025

Sign in to browse more jobs

Create account — see all 8,510 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.