ENERGY- AND LATENCY-EFFICIENT DNN COMPRESSION FOR EDGE IOT SYSTEMS
Keywords:
DNN Compression, IoT, Low-Latency Computing, Model Pruning, Quantization, Edge AI, Real-Time Processing.Abstract
The rapid growth of Internet of Things (IoT) systems has emphasized the need for efficient deep neural network (DNN) processing under stringent latency and resource constraints. Conventional DNNs require significant computational power, making them unsuitable for real-time IoT deployments with limited memory, bandwidth, and processing capability. This paper proposes a low-latency DNN compression framework that combines structured pruning, quantization-aware training, and lightweight model reparameterization. The proposed method reduces computational complexity while maintaining competitive accuracy, enabling faster inference on edge IoT devices. Experimental evaluations demonstrate up to 62% reduction in model size and 48% improvement in inference speed. The approach provides a scalable and energy-efficient solution for real-time IoT applications.