Optimized Lightweight CNN Architectures for Real-Time Inference on Edge and Embedded Devices
Keywords:
Lightweight CNN, Edge Computing, Real-Time Inference, Model Compression, Embedded Devices, Quantization, Knowledge DistillationAbstract
The edge computing paradigm has become central in reversing the distance between artificial intelligence (AI) application and the data source by bringing them near to one another, facilitating quick decisions in real time frameworks, an energy-efficient decision-making process. Nevertheless, the deployment of the traditional deep learning models, in particular, convolutional neural networks (CNNs) to the edge and embedded devices is a major challenge since they require high computational and memory capacities. The paper develops an efficient architecture of the lightweight CNN whose main input is made to focus on the real-time interpretation of the process on resource-limited devices like NVIDIA Jetson Nano or Raspberry Pi 4. Our strategy involves a multi-pronged model-compression strategy that incorporates structured pruning, 8-bit quantization, and knowledge distillation into a combination with the current architectural innovation components depth-wise separable convolutions and grouped layers. On the benchmark datasets, like CIFAR-10 or Tiny ImageNet, we show that the models proposed show a good trade-off between efficiency and accuracy through wide experimental studies. The optimized CNNs achieve competitive classification accuracy (up to 90.1%), but achieve up to 65 percent latency reduction and up to 45 percent energy reduction compared to the uncompressed CNNs. We also perform actual device validation and evaluate performance based on major metrics such as model size, memory footprint, throughput and power consumption. In addition, a real-world application of machine surveillance is provided by a case study that demonstrates the real-life applicability of our models to edge AI applications that exceed the real-time object detection capabilities by using less total power. This study does not only point to the viability of light weight CNNs in edge inference but also generates a scalable optimization pipeline that can be used in a wide variety of deep learning architectures. The results open up the possibility to applying the robust intelligent systems in areas like health tracking, autonomous platforms, Internet-of-Things (IoT) setups, among other areas where performance, energy, and latency are key factors. The tempting solution to this challenge of robust low-power AI on the edge is the proposed framework that paves the way to the next-generation embedded intelligence.