The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
GUINNESS (pensintesis rangkaian saraf binari berasaskan GUI) ialah aliran alat sumber terbuka untuk rangkaian saraf dalam terdua ke arah pelaksanaan FPGA berdasarkan GUI termasuk kedua-dua latihan pada GPU dan inferens pada FPGA. Memandangkan semua operasi dilakukan pada GUI, pereka perisian tidak perlu menulis sebarang skrip untuk mereka bentuk struktur rangkaian saraf, tingkah laku latihan, hanya menentukan nilai untuk hiperparameter. Selepas menamatkan latihan, ia secara automatik menjana kod C++ untuk mensintesis aliran bit menggunakan aliran alat reka bentuk sistem Xilinx SDSoC. Oleh itu, aliran alat kami sesuai untuk pengaturcara perisian yang tidak biasa dengan reka bentuk FPGA. Dalam aliran alat kami, kami mengubah suai algoritma latihan latihan dan inferens untuk perkakasan CNN binari. Memandangkan perkakasan mempunyai bilangan ketepatan bit yang terhad, ia tidak mempunyai berat sebelah minimum dalam latihan. Selain itu, untuk inferens pada perkakasan, teknik penormalan kelompok konvensional memerlukan perkakasan tambahan. Pengubahsuaian kami menyelesaikan masalah ini. Kami melaksanakan penanda aras VGG-11 CNN pada Digilent Inc. Zedboard. Berbanding dengan pelaksanaan binari konvensional pada FPGA, ketepatan klasifikasi adalah hampir sama, prestasi setiap kecekapan kuasa adalah 5.1 kali lebih baik, bagi prestasi setiap kecekapan kawasan, ia adalah 8.0 kali lebih baik, dan bagi prestasi setiap memori, ia adalah 8.2 kali lebih baik. Kami membandingkan reka bentuk FPGA yang dicadangkan dengan reka bentuk CPU dan GPU. Berbanding dengan ARM Cortex-A57, ia adalah 1776.3 kali lebih pantas, ia melesapkan kuasa 3.0 kali lebih rendah, dan prestasi setiap kecekapan kuasa adalah 5706.3 kali lebih baik. Selain itu, berbanding dengan GPU Maxwell, ia adalah 11.5 kali lebih pantas, ia menghilangkan kuasa 7.3 kali lebih rendah, dan prestasi setiap kecekapan kuasa adalah 83.0 kali lebih baik. Kelemahan reka bentuk berasaskan FPGA kami memerlukan masa tambahan untuk mensintesis kod boleh laku FPGA. Daripada percubaan, ia mengambil masa lebih tiga jam, dan jumlah reka bentuk FPGA mengambil masa 75 jam. Oleh kerana latihan CNN adalah dominan, ia adalah agak besar.
Hiroki NAKAHARA
Tokyo Institute of Technology
Haruyoshi YONEKAWA
Tokyo Institute of Technology
Tomoya FUJII
Tokyo Institute of Technology
Masayuki SHIMODA
Tokyo Institute of Technology
Shimpei SATO
Tokyo Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Hiroki NAKAHARA, Haruyoshi YONEKAWA, Tomoya FUJII, Masayuki SHIMODA, Shimpei SATO, "GUINNESS: A GUI Based Binarized Deep Neural Network Framework for Software Programmers" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 5, pp. 1003-1011, May 2019, doi: 10.1587/transinf.2018RCP0002.
Abstract: The GUINNESS (GUI based binarized neural network synthesizer) is an open-source tool flow for a binarized deep neural network toward FPGA implementation based on the GUI including both the training on the GPU and inference on the FPGA. Since all the operation is done on the GUI, the software designer is not necessary to write any scripts to design the neural network structure, training behavior, only specify the values for hyperparameters. After finishing the training, it automatically generates C++ codes to synthesis the bit-stream using the Xilinx SDSoC system design tool flow. Thus, our tool flow is suitable for the software programmers who are not familiar with the FPGA design. In our tool flow, we modify the training algorithms both the training and the inference for a binarized CNN hardware. Since the hardware has a limited number of bit precision, it lacks minimal bias in training. Also, for the inference on the hardware, the conventional batch normalization technique requires additional hardware. Our modifications solve these problems. We implemented the VGG-11 benchmark CNN on the Digilent Inc. Zedboard. Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 times better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better. We compare the proposed FPGA design with the CPU and the GPU designs. Compared with the ARM Cortex-A57, it was 1776.3 times faster, it dissipated 3.0 times lower power, and its performance per power efficiency was 5706.3 times better. Also, compared with the Maxwell GPU, it was 11.5 times faster, it dissipated 7.3 times lower power, and its performance per power efficiency was 83.0 times better. The disadvantage of our FPGA based design requires additional time to synthesize the FPGA executable codes. From the experiment, it consumed more three hours, and the total FPGA design took 75 hours. Since the training of the CNN is dominant, it is considerable.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018RCP0002/_p
Salinan
@ARTICLE{e102-d_5_1003,
author={Hiroki NAKAHARA, Haruyoshi YONEKAWA, Tomoya FUJII, Masayuki SHIMODA, Shimpei SATO, },
journal={IEICE TRANSACTIONS on Information},
title={GUINNESS: A GUI Based Binarized Deep Neural Network Framework for Software Programmers},
year={2019},
volume={E102-D},
number={5},
pages={1003-1011},
abstract={The GUINNESS (GUI based binarized neural network synthesizer) is an open-source tool flow for a binarized deep neural network toward FPGA implementation based on the GUI including both the training on the GPU and inference on the FPGA. Since all the operation is done on the GUI, the software designer is not necessary to write any scripts to design the neural network structure, training behavior, only specify the values for hyperparameters. After finishing the training, it automatically generates C++ codes to synthesis the bit-stream using the Xilinx SDSoC system design tool flow. Thus, our tool flow is suitable for the software programmers who are not familiar with the FPGA design. In our tool flow, we modify the training algorithms both the training and the inference for a binarized CNN hardware. Since the hardware has a limited number of bit precision, it lacks minimal bias in training. Also, for the inference on the hardware, the conventional batch normalization technique requires additional hardware. Our modifications solve these problems. We implemented the VGG-11 benchmark CNN on the Digilent Inc. Zedboard. Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 times better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better. We compare the proposed FPGA design with the CPU and the GPU designs. Compared with the ARM Cortex-A57, it was 1776.3 times faster, it dissipated 3.0 times lower power, and its performance per power efficiency was 5706.3 times better. Also, compared with the Maxwell GPU, it was 11.5 times faster, it dissipated 7.3 times lower power, and its performance per power efficiency was 83.0 times better. The disadvantage of our FPGA based design requires additional time to synthesize the FPGA executable codes. From the experiment, it consumed more three hours, and the total FPGA design took 75 hours. Since the training of the CNN is dominant, it is considerable.},
keywords={},
doi={10.1587/transinf.2018RCP0002},
ISSN={1745-1361},
month={May},}
Salinan
TY - JOUR
TI - GUINNESS: A GUI Based Binarized Deep Neural Network Framework for Software Programmers
T2 - IEICE TRANSACTIONS on Information
SP - 1003
EP - 1011
AU - Hiroki NAKAHARA
AU - Haruyoshi YONEKAWA
AU - Tomoya FUJII
AU - Masayuki SHIMODA
AU - Shimpei SATO
PY - 2019
DO - 10.1587/transinf.2018RCP0002
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2019
AB - The GUINNESS (GUI based binarized neural network synthesizer) is an open-source tool flow for a binarized deep neural network toward FPGA implementation based on the GUI including both the training on the GPU and inference on the FPGA. Since all the operation is done on the GUI, the software designer is not necessary to write any scripts to design the neural network structure, training behavior, only specify the values for hyperparameters. After finishing the training, it automatically generates C++ codes to synthesis the bit-stream using the Xilinx SDSoC system design tool flow. Thus, our tool flow is suitable for the software programmers who are not familiar with the FPGA design. In our tool flow, we modify the training algorithms both the training and the inference for a binarized CNN hardware. Since the hardware has a limited number of bit precision, it lacks minimal bias in training. Also, for the inference on the hardware, the conventional batch normalization technique requires additional hardware. Our modifications solve these problems. We implemented the VGG-11 benchmark CNN on the Digilent Inc. Zedboard. Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 times better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better. We compare the proposed FPGA design with the CPU and the GPU designs. Compared with the ARM Cortex-A57, it was 1776.3 times faster, it dissipated 3.0 times lower power, and its performance per power efficiency was 5706.3 times better. Also, compared with the Maxwell GPU, it was 11.5 times faster, it dissipated 7.3 times lower power, and its performance per power efficiency was 83.0 times better. The disadvantage of our FPGA based design requires additional time to synthesize the FPGA executable codes. From the experiment, it consumed more three hours, and the total FPGA design took 75 hours. Since the training of the CNN is dominant, it is considerable.
ER -