EFFICIENT HIGH LEVEL SYNTHESIS IMPLEMENTATION FOR DEEP CONVOLUTIONAL NEURAL NETWORKS ON FPGA

Muhammad Nabil Muhammad Sarg;

Abstract


Abstract
Deep Convolutional Neural Network (CNN) plays an important role in computer vision applications. It is a deep learning architecture that was inspired by the visual perception mechanisms in the living creatures. CNNs provide very good performance and achieve state-of-the-are results in solving many problems such as computer vision, natural language processing, and speech recognition. This outstanding performance is due to the rapid increase in the amout of credible data in addition to the significant advancements in the computing power of Graphical Processing Units (GPUs).
However, CNNs are computational-intensive and resource consuming; millions of parameters and billions of operations are needed to classify an image in a Deep CNN. Thus, it is hard to integrate them into embedded platforms that have strict constraints on power consumption and physical size. Therefore, general Central Processing Units and GPUs cannot achieve the required performance levles.
The FPGA is one of the most promising platforms for accelerating Deep CNNs due to its configurability, high performance, high parallelization degrees, low power, and shorter development cycles. Also, the availability of High-Level Synthesis (HLS) tools lowers the programing barrier and shortens the development cycle of the FPGA-based applications.
In this research, we introduce a diverse HLS C++ Deep CNNs compiler that generates highly efficient implementation for accelerating the inference task of Deep CNNs on FPGAs. We developed HLS C++ implementations for the Deep CNNs blocks that efficiently utilize the FPGA available resources to achieve the maximum inference performance. The developed compiler automates the customization of the accelerator HLS implementation to best fit the selected Deep CNN model. Further, it schedules the model layers, generates the weights files and the instructions stream to run the model layers on the proposed FPGA accelerator architecture.
As the convolutional operations dominate the operations in CNNs, this work focuses on the acceleration of the convolutional layers. Thus, loop optimization techniques such as loop tiling and unrolling are employed. Also, additional optimization techniques are introduced to minimize the number of required operations, reduce the data movements, and achieve higher computing efficiency.
The proposed work is demonstrated with implementation of two different Deep CNN models, Resnet-50 and VGG16, on Xilinx Zynq Ultrascale+ MPSoC using Xilinx SDSoC development environment. The developed accelerator achieves up to 339x speedup.


Other data

Title EFFICIENT HIGH LEVEL SYNTHESIS IMPLEMENTATION FOR DEEP CONVOLUTIONAL NEURAL NETWORKS ON FPGA
Other Titles التنفيذ الفعال عالى المستوى للتوليف للشبكات العصبية التلافيفية العميقة على FPGA
Authors Muhammad Nabil Muhammad Sarg
Issue Date 2022

Attached Files

File SizeFormat
BB13767.pdf720.57 kBAdobe PDFView/Open
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

views 2 in Shams Scholar


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.