About the job
Join Huawei Canada as an AI Computing Systems Researcher and contribute to groundbreaking advancements in AI technology.
Team Overview:
The Advanced Computing and Storage Lab at the Vancouver Research Centre is dedicated to pioneering adaptive computing architectures that effectively manage the complexities of future application loads. Our mission is to ensure the robustness of training clusters through innovative dynamic configuration strategies and precision control systems, enhancing both stability and efficiency of computational power. We focus on crucial AI industry applications, including large model training and inference, utilizing cutting-edge techniques such as low-precision training and reinforcement learning to analyze bottlenecks and develop optimization solutions.
Position Responsibilities:
Drive advancements in AI systems on the Ascend platform by enhancing performance, efficiency, and usability tailored for large model training and inference.
Design and develop optimization solutions focusing on FP8 optimization, reinforcement learning-driven training agents, and next-generation multi-modal understanding and generation.
Integrate AI algorithm requirements with system-level architectural enhancements in computing, I/O, scheduling, and precision control to boost overall system performance.
Establish stable and efficient AI training clusters using dynamic configuration and precision control to guarantee scalability and reliability.
Create software frameworks, operator libraries, and system-level optimizations for NPU platforms to expedite large-model AI training processes.
Lead innovative strategies in optimizing large-model training and inference through low-precision training, parallel strategy tuning, and reinforcement learning techniques.

