About the job
About Etched
Etched is pioneering the development of the world's first AI inference system specifically designed for transformer architectures, achieving over 10 times the performance and significantly lower costs and latencies compared to traditional B200 systems. Our Etched ASICs enable groundbreaking products that are unattainable with GPUs, such as real-time video generation models and highly advanced deep reasoning agents. With substantial backing from premier investors and a team of leading engineers, Etched is transforming the infrastructure landscape for the fastest-growing industry in history.
Job Summary
We are in search of a highly skilled and proactive Supercomputing Software Engineer to join our dynamic team. This individual will be integral to the foundational software that drives our server infrastructure. Responsibilities will include the development, integration, and debugging of pivotal system software components, such as BIOS, BMC firmware, boot processes (including NetBoot), root of trust implementations, advanced system logging, and kernel-mode drivers. Your contributions will be vital in ensuring the reliability, security, and performance of our server platforms while facilitating the integration of data center orchestration technologies at the node level.
Key Responsibilities
Integrate and maintain BIOS and BMC firmware to ensure efficient server boot processes.
Performance Tuning: Analyze DRAM timings, PCIe configurations, and power state transitions to optimize performance and reliability.
Security Measures: Validate security features, including root of trust mechanisms, to safeguard system integrity and data security.
Advanced Diagnostics: Design and implement sophisticated logging and diagnostic capabilities for effective troubleshooting and performance evaluation.
Orchestration Integration: Incorporate and enhance node-level data center orchestration technologies like Kubernetes and Docker within the software stack.
Validation and Testing: Create and execute comprehensive test plans to ensure system software functionality, stability, and performance.
Team Collaboration: Work alongside hardware and software teams to diagnose and resolve complex system-level challenges.
