Energy-Efficient Multimodal Edge AI
Recent advances in deep learning have been driven primarily by GPUs, which excel at parallelizing computations for both training and inference. However, their high power consumption and large physical footprint limit their use in portable and resource-constrained devices. Neural Processing Units (NPUs), specialized AI processors, provide a more energy-efficient hardware for neural network inference on resource-constrained devices. ; This research begins with a systematic benchmarking of commercial NPUs to evaluate their performance on vision-based models. To enable seamless deployment across heterogeneous hardware, the first year of research will target the development of an open-source toolkit of model-agnostic adapters, supporting pre- and post-processing, mixed-precision computation, and graph-level optimizations compatible with the ONNX Runtime and TensorFlow Lite interchange format. The benchmark suite will cover representative models across tasks, starting from object and pose detection and extending to egocentric action recognition and lightweight video understanding. ; Building on these findings, in the second and third years I will be studying Small Multimodal Language Models (SMLM) to extend the framework developed during the first year to more complex multimodal scenarios. The objective will be designing and developing a hardware-agnostic SMLM capable of fusing visual and textual information entirely on-device, using only NPUs. The design incorporates advanced techniques such as streaming inference, early-exit strategies, model partitioning, keyframe selection, token and feature pruning, and memory-aware context management, enabling real-time scene summarization and query response under strict computational and power constraints. ; Throughout the three years, the study will investigate energy-aware embedded optimization to identify the network components that consume the most power and redesign models for maximal energy efficiency, providing a principled approach to balancing accuracy, latency, and energy use in next-generation edge AI systems.
Back to Current Students