Edge-Accelerated Deep Neural Networks on FPGA for Real-Time IoT Video Analytics
Keywords:
Edge AI, FPGA acceleration, Real-time video analytics, IoT, Deep neural networks, YOLOv4-Tiny, Vitis AI, Object detection, Low-latency inference, embedded systemsAbstract
The growth of Internet of Things (IoT) devices and the greater use of real-time video analytics has placed to severe stress on conventional cloud-centric processing systems, especially as they relate to latency, bandwidth requirements and info security. To overcome such drawbacks, this paper suggests a new FPGA-based edge computing system with an architecture that is capable of providing high-performance real-time deep neural network (DNN)-based video analytics in limited-resource IoT devices. This system uses a quantized convolutional neural network (CNN), that is an optimized YOLOv4-Tiny model, integrated to work with a Field-Programmable Gate Arrays (FPGA) in doing object detection and tracking on high-definition video streams due to the inherent parallelism, reconfigurability, and energy efficiency of the FPGAs. It is applied to a Xilinx ZynqUltraScale+ MPSoC with its Deep Learning Processing Unit (DPU) used to speed inference and integrated ARM processor used to do input/output preprocessing and postprocessing functions. The targeted edge system would analyze video streams provided by IoT-enabled cameras and provide real-time output of analytics data, which would decrease the reliance on the cloud servers to the minimum. Experimental assessment on the dataset provided by the AI City Challenge shows significant improvement in throughput (32 FPS), inference latency (31 ms), and power consumption (4.5 W), coupled with only a slight trade-off on detection accuracy when compared to GPU-based platforms (e.g. NVIDIA Jetson TX2) and USB-based accelerators (e.g. Intel Movidius NCS2). Due to its real-time capability and efficiency, the system is appropriate in terms of the deployment of the system in applications like smart surveillance, intelligent transportation system, and monitoring of the industry. The research confirms that complex DNNs can be used on FPGAs at network edge and have the potential of changing the landscape of edge AI deployment through scalable, low latency and power-aware solutions. Fine-grain parallelism with multiple FPGAs, more architectures of DNNs, and federated learning frameworks integration are future improvements toward adaptive edge intelligence.