The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Kami membangunkan kelompok PYNQ yang terdiri daripada papan Zynq yang menjimatkan, dipanggil M-KUBOS, yang disambungkan melalui pautan bersiri GTH berprestasi tinggi kos rendah. Untuk persekitaran perisian, kami menggunakan platform perisian sumber terbuka PYNQ. Kelompok PYNQ dijangka menjadi pelayan pengkomputeran tepi berbilang akses (MEC) untuk rangkaian mudah alih 5G. Kami melaksanakan pemecut inferens ResNet-50 pada kelompok PYNQ untuk pengecaman imej aplikasi MEC. Dengan menganggarkan masa pelaksanaan setiap lapisan ResNet-50, lapisan ResNet-50 dibahagikan kepada berbilang papan supaya masa pelaksanaan setiap papan adalah sama yang mungkin untuk pemprosesan saluran paip yang cekap. Disebabkan kluster PYNQ di mana FPGA disambungkan secara langsung melalui pautan bersiri berkelajuan tinggi, pemprosesan strim tanpa kesesakan rangkaian dan pemprosesan saluran paip antara papan dapat direalisasikan dengan mudah. Pelaksanaan pada 4 papan mencapai prestasi 292 GOPS, daya tampung 75.1 FPS dan kecekapan kuasa 7.81 GOPS/W. Ia mencapai kelajuan 17 kali lebih pantas dan kecekapan kuasa 130 kali ganda berbanding pelaksanaan pada CPU, dan kecekapan kuasa 5.8 kali ganda lebih tinggi berbanding pelaksanaan pada GPU.
Yasuyu FUKUSHIMA
Keio University
Kensuke IIZUKA
Keio University
Hideharu AMANO
Keio University
FPGA, berbilang FPGA, MEC, CNN
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Yasuyu FUKUSHIMA, Kensuke IIZUKA, Hideharu AMANO, "Parallel Implementation of CNN on Multi-FPGA Cluster" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 7, pp. 1198-1208, July 2023, doi: 10.1587/transinf.2022EDP7175.
Abstract: We developed a PYNQ cluster that consists of economical Zynq boards, called M-KUBOS, that are interconnected through low-cost high-performance GTH serial links. For the software environment, we employed the PYNQ open-source software platform. The PYNQ cluster is anticipated to be a multi-access edge computing (MEC) server for 5G mobile networks. We implemented the ResNet-50 inference accelerator on the PYNQ cluster for image recognition of MEC applications. By estimating the execution time of each ResNet-50 layer, layers of ResNet-50 were divided into multiple boards so that the execution time of each board would be as equal as possible for efficient pipeline processing. Owing to the PYNQ cluster in which FPGAs were directly connected by high-speed serial links, stream processing without network bottlenecks and pipeline processing between boards were readily realized. The implementation on 4 boards achieved 292 GOPS performance, 75.1 FPS throughput, and 7.81 GOPS/W power efficiency. It achieved 17 times faster speed and 130 times more power efficiency compared to the implementation on the CPU, and 5.8 times more power efficiency compared to the implementation on the GPU.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDP7175/_p
Salinan
@ARTICLE{e106-d_7_1198,
author={Yasuyu FUKUSHIMA, Kensuke IIZUKA, Hideharu AMANO, },
journal={IEICE TRANSACTIONS on Information},
title={Parallel Implementation of CNN on Multi-FPGA Cluster},
year={2023},
volume={E106-D},
number={7},
pages={1198-1208},
abstract={We developed a PYNQ cluster that consists of economical Zynq boards, called M-KUBOS, that are interconnected through low-cost high-performance GTH serial links. For the software environment, we employed the PYNQ open-source software platform. The PYNQ cluster is anticipated to be a multi-access edge computing (MEC) server for 5G mobile networks. We implemented the ResNet-50 inference accelerator on the PYNQ cluster for image recognition of MEC applications. By estimating the execution time of each ResNet-50 layer, layers of ResNet-50 were divided into multiple boards so that the execution time of each board would be as equal as possible for efficient pipeline processing. Owing to the PYNQ cluster in which FPGAs were directly connected by high-speed serial links, stream processing without network bottlenecks and pipeline processing between boards were readily realized. The implementation on 4 boards achieved 292 GOPS performance, 75.1 FPS throughput, and 7.81 GOPS/W power efficiency. It achieved 17 times faster speed and 130 times more power efficiency compared to the implementation on the CPU, and 5.8 times more power efficiency compared to the implementation on the GPU.},
keywords={},
doi={10.1587/transinf.2022EDP7175},
ISSN={1745-1361},
month={July},}
Salinan
TY - JOUR
TI - Parallel Implementation of CNN on Multi-FPGA Cluster
T2 - IEICE TRANSACTIONS on Information
SP - 1198
EP - 1208
AU - Yasuyu FUKUSHIMA
AU - Kensuke IIZUKA
AU - Hideharu AMANO
PY - 2023
DO - 10.1587/transinf.2022EDP7175
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2023
AB - We developed a PYNQ cluster that consists of economical Zynq boards, called M-KUBOS, that are interconnected through low-cost high-performance GTH serial links. For the software environment, we employed the PYNQ open-source software platform. The PYNQ cluster is anticipated to be a multi-access edge computing (MEC) server for 5G mobile networks. We implemented the ResNet-50 inference accelerator on the PYNQ cluster for image recognition of MEC applications. By estimating the execution time of each ResNet-50 layer, layers of ResNet-50 were divided into multiple boards so that the execution time of each board would be as equal as possible for efficient pipeline processing. Owing to the PYNQ cluster in which FPGAs were directly connected by high-speed serial links, stream processing without network bottlenecks and pipeline processing between boards were readily realized. The implementation on 4 boards achieved 292 GOPS performance, 75.1 FPS throughput, and 7.81 GOPS/W power efficiency. It achieved 17 times faster speed and 130 times more power efficiency compared to the implementation on the CPU, and 5.8 times more power efficiency compared to the implementation on the GPU.
ER -