About the job
About Etched
Etched is pioneering the world’s first AI inference system specifically designed for transformers, achieving over 10x the performance and substantially reduced cost and latency compared to traditional GPUs. Our innovative ASICs enable the creation of groundbreaking products, such as real-time video generation models and advanced reasoning agents with deep and parallel processing capabilities. Supported by substantial investments from leading venture firms and a team of top-tier engineers, Etched is transforming the infrastructure layer for one of the fastest-growing industries globally.
Key Responsibilities
Kernel-Mode Driver Development: Innovate, develop, and maintain kernel-mode drivers with an emphasis on reliability, comprehensive debugging, and peak performance.
Performance Optimization: Conduct thorough analysis and optimization of driver efficiency for high-demand AI tasks, aiming to reduce latency and enhance throughput.
Hardware Integration and Co-Design: Work in tandem with hardware engineers during the ASIC design lifecycle to ensure seamless integration.
Virtualization Support: Develop driver capabilities for virtualization technologies including SR-IOV, VFIO, and para-virtualization.
Memory Management: Design and implement effective memory management protocols, addressing kernel memory mapping, page tables, NUMA awareness for device data caching, and IOMMU configurations.
Security: Create kernel drivers with a fundamental focus on security, safeguarding host processes, physical memory spaces, and device attestation.
Debugging and Troubleshooting: Identify and resolve intricate driver issues utilizing standard kernel debugging tools and methodologies (ftrace, dmesg, etc.) for bug resolution.
Synchronization and Concurrency: Develop synchronization strategies to manage concurrent access to multiple accelerators.
System Validation and Testing: Formulate and execute comprehensive testing protocols to validate the functionality, stability, and performance of drivers in production and manufacturing settings.
Collaboration and Troubleshooting: Work collaboratively with both software and hardware teams to diagnose and resolve complex system-level challenges.

