The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Multilayer perceptron (MLP) ialah model rangkaian saraf asas yang digunakan dalam aplikasi industri praktikal, seperti sistem pengesanan pencerobohan rangkaian (NID). Ia juga digunakan sebagai blok binaan dalam model yang lebih baharu, seperti gMLP. Pada masa ini, terdapat permintaan untuk latihan pantas dalam NID dan kawasan lain. Walau bagaimanapun, dalam latihan dengan banyak GPU, masalah penggunaan kuasa dan masa latihan yang panjang timbul. Kebanyakan model rangkaian saraf dalam (DNN) dan MLP terkini dilatih menggunakan algoritma perambatan belakang yang menghantar kecerunan ralat dari lapisan keluaran ke lapisan input supaya dalam pengiraan berjujukan, input seterusnya tidak dapat diproses sehingga pemberat semua lapisan dikemas kini dari lapisan terakhir. Ini dikenali sebagai penguncian ke belakang. Dalam kajian ini, mekanisme kemas kini parameter berat dicadangkan dengan kelewatan masa yang boleh menampung kelewatan kemas kini berat untuk membolehkan pengiraan ke hadapan dan ke belakang serentak. Untuk tujuan ini, struktur tatasusunan sistolik satu dimensi telah direka pada kad Xilinx U50 Alveo FPGA di mana setiap lapisan MLP ditugaskan kepada elemen pemprosesan (PE). Algoritma perambatan belakang kelewatan masa melaksanakan semua lapisan secara selari dan memindahkan data antara lapisan dalam saluran paip. Berbanding dengan CPU Intel Core i9 dan GPU NVIDIA RTX 3090, ia adalah 3 kali lebih pantas daripada CPU dan 2.5 kali lebih pantas daripada GPU. Kelajuan pemprosesan setiap penggunaan kuasa adalah 11.5 kali lebih baik daripada CPU dan 21.4 kali lebih baik daripada GPU. Daripada keputusan ini, disimpulkan bahawa pemecut latihan pada FPGA boleh mencapai kelajuan tinggi dan kecekapan tenaga.
Takeshi SENOO
Tokyo Institute of Technology
Akira JINGUJI
Tokyo Institute of Technology
Ryosuke KURAMOCHI
Tokyo Institute of Technology
Hiroki NAKAHARA
Tokyo Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Takeshi SENOO, Akira JINGUJI, Ryosuke KURAMOCHI, Hiroki NAKAHARA, "Multilayer Perceptron Training Accelerator Using Systolic Array" in IEICE TRANSACTIONS on Information,
vol. E105-D, no. 12, pp. 2048-2056, December 2022, doi: 10.1587/transinf.2022PAP0003.
Abstract: Multilayer perceptron (MLP) is a basic neural network model that is used in practical industrial applications, such as network intrusion detection (NID) systems. It is also used as a building block in newer models, such as gMLP. Currently, there is a demand for fast training in NID and other areas. However, in training with numerous GPUs, the problems of power consumption and long training times arise. Many of the latest deep neural network (DNN) models and MLPs are trained using a backpropagation algorithm which transmits an error gradient from the output layer to the input layer such that in the sequential computation, the next input cannot be processed until the weights of all layers are updated from the last layer. This is known as backward locking. In this study, a weight parameter update mechanism is proposed with time delays that can accommodate the weight update delay to allow simultaneous forward and backward computation. To this end, a one-dimensional systolic array structure was designed on a Xilinx U50 Alveo FPGA card in which each layer of the MLP is assigned to a processing element (PE). The time-delay backpropagation algorithm executes all layers in parallel, and transfers data between layers in a pipeline. Compared to the Intel Core i9 CPU and NVIDIA RTX 3090 GPU, it is 3 times faster than the CPU and 2.5 times faster than the GPU. The processing speed per power consumption is 11.5 times better than that of the CPU and 21.4 times better than that of the GPU. From these results, it is concluded that a training accelerator on an FPGA can achieve high speed and energy efficiency.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022PAP0003/_p
Salinan
@ARTICLE{e105-d_12_2048,
author={Takeshi SENOO, Akira JINGUJI, Ryosuke KURAMOCHI, Hiroki NAKAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Multilayer Perceptron Training Accelerator Using Systolic Array},
year={2022},
volume={E105-D},
number={12},
pages={2048-2056},
abstract={Multilayer perceptron (MLP) is a basic neural network model that is used in practical industrial applications, such as network intrusion detection (NID) systems. It is also used as a building block in newer models, such as gMLP. Currently, there is a demand for fast training in NID and other areas. However, in training with numerous GPUs, the problems of power consumption and long training times arise. Many of the latest deep neural network (DNN) models and MLPs are trained using a backpropagation algorithm which transmits an error gradient from the output layer to the input layer such that in the sequential computation, the next input cannot be processed until the weights of all layers are updated from the last layer. This is known as backward locking. In this study, a weight parameter update mechanism is proposed with time delays that can accommodate the weight update delay to allow simultaneous forward and backward computation. To this end, a one-dimensional systolic array structure was designed on a Xilinx U50 Alveo FPGA card in which each layer of the MLP is assigned to a processing element (PE). The time-delay backpropagation algorithm executes all layers in parallel, and transfers data between layers in a pipeline. Compared to the Intel Core i9 CPU and NVIDIA RTX 3090 GPU, it is 3 times faster than the CPU and 2.5 times faster than the GPU. The processing speed per power consumption is 11.5 times better than that of the CPU and 21.4 times better than that of the GPU. From these results, it is concluded that a training accelerator on an FPGA can achieve high speed and energy efficiency.},
keywords={},
doi={10.1587/transinf.2022PAP0003},
ISSN={1745-1361},
month={December},}
Salinan
TY - JOUR
TI - Multilayer Perceptron Training Accelerator Using Systolic Array
T2 - IEICE TRANSACTIONS on Information
SP - 2048
EP - 2056
AU - Takeshi SENOO
AU - Akira JINGUJI
AU - Ryosuke KURAMOCHI
AU - Hiroki NAKAHARA
PY - 2022
DO - 10.1587/transinf.2022PAP0003
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2022
AB - Multilayer perceptron (MLP) is a basic neural network model that is used in practical industrial applications, such as network intrusion detection (NID) systems. It is also used as a building block in newer models, such as gMLP. Currently, there is a demand for fast training in NID and other areas. However, in training with numerous GPUs, the problems of power consumption and long training times arise. Many of the latest deep neural network (DNN) models and MLPs are trained using a backpropagation algorithm which transmits an error gradient from the output layer to the input layer such that in the sequential computation, the next input cannot be processed until the weights of all layers are updated from the last layer. This is known as backward locking. In this study, a weight parameter update mechanism is proposed with time delays that can accommodate the weight update delay to allow simultaneous forward and backward computation. To this end, a one-dimensional systolic array structure was designed on a Xilinx U50 Alveo FPGA card in which each layer of the MLP is assigned to a processing element (PE). The time-delay backpropagation algorithm executes all layers in parallel, and transfers data between layers in a pipeline. Compared to the Intel Core i9 CPU and NVIDIA RTX 3090 GPU, it is 3 times faster than the CPU and 2.5 times faster than the GPU. The processing speed per power consumption is 11.5 times better than that of the CPU and 21.4 times better than that of the GPU. From these results, it is concluded that a training accelerator on an FPGA can achieve high speed and energy efficiency.
ER -