Optimization Space Exploration for Edge ML Compilers
The proliferation of Artificial Intelligence and of pervasive IoT & mobile devices is pushing the industry towards running Machine Learning inference on edge devices. These platforms pose stringent constraints on power consumption, latency and memory footprint, and thus they require the AI workloads to be executed on highly-efficient NN accelerators, and to be thoroughly optimized for the specific target device.
Specialized AI accelerators such as neural processing units (NPUs) and mobile GPUs are becoming a standard feature in microcontrollers (MCUs) and System-on-a-Chip (SoCs). Neural networks can be deployed on such accelerators by compiling them with ad-hoc neural network compilers that incorporate graph optimization stages. Due to the broad diversity of topologies that can be found in modern NNs, different optimizations and transformations can be applied at compile-time to better adapt the workloads to the target hardware accelerator. Moreover, several variants of the target device can be considered at design time, possibly including novel hardware architectures such as In-Memory Computing to overcome the limitations of traditional von-Neumann architectures by maximising data reuse.Each of these decisions can impact the performance and the power consumption of the final solution, but oftentimes the resulting Design & Optimization space is too vast to be explored exhaustively. Thus, the selection of the optimal parameters is made possible by automated exploration processes attempting to minimize one or more objectives.
This PhD research aims to address the problem of compile-time optimizations exploration to automatically select the Pareto-optimal set of optimizations parameters, in order to efficiently run a NN on edge AI accelerators, and explore the impact of using different target devices on the applications performance. Part of this research effort was carried out in collaboration with STMicroelectronics (SRA department in Cornaredo), that is developing a family of NPUs that combine traditional convolutional accelerators with In-Memory Computing.
Furthermore, this research aims to apply the aforementioned compilation techniques to deploy an Edge AI system in a real-world industrial scenario, to showcase and validate the approach.
In particular, the following contributions are proposed: ;
- ;
- Develop optimization techniques and mapping strategies for traditional & In-Memory-Computing Neural Processing Units. ;
- Develop novel exploration algorithms to automatically select at compile-time the Pareto-optimal set of optimizations parameters to efficiently run a NN on an embedded NPU. ;
- Implement the exploration techniques into industrial and open-source compilers, and test them targeting different hardware devices (MCUs, NPUs, mobile GPUs, ?). ;
- Optimize and deploy Edge AI algorithms in a real-world industrial scenario using the aforementioned mapping techniques ;
Back to Alumni