Design and Performance Evaluation of a Hardware-Accelerated VLSI Architecture for Deep Neural Network Inference
Keywords:
Deep Neural Network Inference, Hardware Acceleration, VLSI Architecture, Processing Element Array, Energy-Efficient Computing, Edge AI SystemsAbstract
The deep neural network (DNN) inference has become an order of workload in edge and embedded computing systems, necessitating much computational throughput with tight energy and area requirements. Traditional CPU and GPU based implementations are characterised by pronounced memory bandwidth reductions as well as low power efficiency upon deployment in a resource-constrained system. In this paper, the author introduces the hardware-accelerated VLSI architecture to allow scalable, low-latency, and energy-efficient DNN inference. The proposed structure combines an array of parallel multiply accumulate (MAC) processing elements (PE) with pipelined computer and streamlined reuse of on-chip memory to ensure that minimal off-chip data transfer is performed. A computational and throughput model is designed in a structured manner to analytically describe the scalability and performance limit. This design is synthesized and tested with representative loads of convolutional neural networks, and saves significant improvement in latency but saves on energy consumption relative to similar baseline architectures. Experimental data proves the close to linear scaling as the number of PE is increased and under good area performance trade-offs. The suggested architecture offers an effective and realistic solution to the issue of real-time deep learning inference of edge and embedded VLSI architectures.
