Qualifications
Key ResponsibilitiesDesign and implement low-level control-plane software for the initialization, configuration, and management of large-scale AI compute clusters. Create system services that directly interface with hardware, firmware, and the operating system. Develop telemetry, logging, and tracing systems to diagnose failures and enhance performance. Establish orchestration primitives for managing devices, nodes, and racks. Analyze and optimize performance across PCIe, memory, networking, kernel, and runtime layers. Collaborate with hardware, firmware, kernel, and runtime teams to design and refine system interfaces and functionality. Must-Have Skills and ExperienceProficient in C/C++ or Rust for low-level systems programming. In-depth knowledge of Linux internals, kernel/user-space interactions, and system-level debugging. Experience working closely with hardware, including drivers, DMA, interrupts, memory management, or device control paths. Excellent debugging skills utilizing logs, tracing, and analysis tools.
About the job
About Etched
Etched is pioneering the development of the world's first AI inference system meticulously designed for transformers, achieving over 10x the performance while significantly reducing costs and latency compared to traditional systems like the B200. Our innovative ASIC technology enables the creation of groundbreaking products that are unachievable with GPUs, including real-time video generation models and highly sophisticated reasoning agents. With substantial backing from leading investors and a team of top-tier engineers, Etched is transforming the foundational infrastructure for one of the fastest-growing industries of our time.
Job Summary
At Etched, we are constructing large-scale AI systems that will facilitate quicker, more efficient inference for billions of users. Our Supercomputing team is integral to this mission. We are on the lookout for a talented and driven Supercomputing Engineer to join our team, where you'll play a key role in developing the essential software that powers our cluster-scale AI compute deployments. This position involves the creation, integration, and troubleshooting of vital system components, focusing on control-plane software, system initialization, telemetry, orchestration primitives, and optimizing performance at the hardware-software interface.
About Etched
Etched is at the forefront of AI technology, specializing in the development of cutting-edge inference systems that leverage advanced ASICs for unprecedented performance and efficiency. Our mission is to reshape the AI landscape by enabling groundbreaking innovations that were previously unattainable.