The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Dalam merealisasikan rangkaian neural convolutional (CNN) dalam perkakasan terbenam yang dikekang sumber, jejak memori pemberat adalah salah satu masalah utama. Teknik pemangkasan sering digunakan untuk mengurangkan bilangan berat. Walau bagaimanapun, taburan pemberat bukan sifar adalah sangat condong, yang menjadikannya lebih sukar untuk menggunakan selari asas. Untuk menangani masalah ini, kami membentangkan SENTEI*, pemangkasan bijak penapis dengan penyulingan, untuk merealisasikan seni bina rangkaian sedar perkakasan dengan ketepatan yang setanding. Pemangkasan bijak penapis menghilangkan pemberat supaya setiap penapis mempunyai bilangan pemberat bukan sifar yang sama, dan latihan semula dengan penyulingan mengekalkan ketepatan. Selanjutnya, kami membangunkan pemecut saluran paip antara lapisan skipping berat sifar pada FPGA. Penyamaan membolehkan selari antara penapis, di mana blok pemprosesan untuk lapisan melaksanakan penapis serentak dengan seni bina mudah. Penilaian kami terhadap tugas pembahagian semantik menunjukkan bahawa mIoU yang terhasil hanya berkurangan sebanyak 0.4 mata. Selain itu, kelajuan dan kecekapan kuasa pelaksanaan FPGA kami adalah 33.2× dan 87.9× lebih tinggi daripada GPU mudah alih. Oleh itu, teknik kami merealisasikan rangkaian menyedari perkakasan dengan ketepatan yang setanding.
Masayuki SHIMODA
Tokyo Institute of Technology
Youki SADA
Tokyo Institute of Technology
Ryosuke KURAMOCHI
Tokyo Institute of Technology
Shimpei SATO
Tokyo Institute of Technology
Hiroki NAKAHARA
Tokyo Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Masayuki SHIMODA, Youki SADA, Ryosuke KURAMOCHI, Shimpei SATO, Hiroki NAKAHARA, "SENTEI: Filter-Wise Pruning with Distillation towards Efficient Sparse Convolutional Neural Network Accelerators" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 12, pp. 2463-2470, December 2020, doi: 10.1587/transinf.2020PAP0013.
Abstract: In the realization of convolutional neural networks (CNNs) in resource-constrained embedded hardware, the memory footprint of weights is one of the primary problems. Pruning techniques are often used to reduce the number of weights. However, the distribution of nonzero weights is highly skewed, which makes it more difficult to utilize the underlying parallelism. To address this problem, we present SENTEI*, filter-wise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy. The filter-wise pruning eliminates weights such that each filter has the same number of nonzero weights, and retraining with distillation retains the accuracy. Further, we develop a zero-weight skipping inter-layer pipelined accelerator on an FPGA. The equalization enables inter-filter parallelism, where a processing block for a layer executes filters concurrently with straightforward architecture. Our evaluation of semantic-segmentation tasks indicates that the resulting mIoU only decreased by 0.4 points. Additionally, the speedup and power efficiency of our FPGA implementation were 33.2× and 87.9× higher than those of the mobile GPU. Therefore, our technique realizes hardware-aware network with comparable accuracy.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020PAP0013/_p
Salinan
@ARTICLE{e103-d_12_2463,
author={Masayuki SHIMODA, Youki SADA, Ryosuke KURAMOCHI, Shimpei SATO, Hiroki NAKAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={SENTEI: Filter-Wise Pruning with Distillation towards Efficient Sparse Convolutional Neural Network Accelerators},
year={2020},
volume={E103-D},
number={12},
pages={2463-2470},
abstract={In the realization of convolutional neural networks (CNNs) in resource-constrained embedded hardware, the memory footprint of weights is one of the primary problems. Pruning techniques are often used to reduce the number of weights. However, the distribution of nonzero weights is highly skewed, which makes it more difficult to utilize the underlying parallelism. To address this problem, we present SENTEI*, filter-wise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy. The filter-wise pruning eliminates weights such that each filter has the same number of nonzero weights, and retraining with distillation retains the accuracy. Further, we develop a zero-weight skipping inter-layer pipelined accelerator on an FPGA. The equalization enables inter-filter parallelism, where a processing block for a layer executes filters concurrently with straightforward architecture. Our evaluation of semantic-segmentation tasks indicates that the resulting mIoU only decreased by 0.4 points. Additionally, the speedup and power efficiency of our FPGA implementation were 33.2× and 87.9× higher than those of the mobile GPU. Therefore, our technique realizes hardware-aware network with comparable accuracy.},
keywords={},
doi={10.1587/transinf.2020PAP0013},
ISSN={1745-1361},
month={December},}
Salinan
TY - JOUR
TI - SENTEI: Filter-Wise Pruning with Distillation towards Efficient Sparse Convolutional Neural Network Accelerators
T2 - IEICE TRANSACTIONS on Information
SP - 2463
EP - 2470
AU - Masayuki SHIMODA
AU - Youki SADA
AU - Ryosuke KURAMOCHI
AU - Shimpei SATO
AU - Hiroki NAKAHARA
PY - 2020
DO - 10.1587/transinf.2020PAP0013
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2020
AB - In the realization of convolutional neural networks (CNNs) in resource-constrained embedded hardware, the memory footprint of weights is one of the primary problems. Pruning techniques are often used to reduce the number of weights. However, the distribution of nonzero weights is highly skewed, which makes it more difficult to utilize the underlying parallelism. To address this problem, we present SENTEI*, filter-wise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy. The filter-wise pruning eliminates weights such that each filter has the same number of nonzero weights, and retraining with distillation retains the accuracy. Further, we develop a zero-weight skipping inter-layer pipelined accelerator on an FPGA. The equalization enables inter-filter parallelism, where a processing block for a layer executes filters concurrently with straightforward architecture. Our evaluation of semantic-segmentation tasks indicates that the resulting mIoU only decreased by 0.4 points. Additionally, the speedup and power efficiency of our FPGA implementation were 33.2× and 87.9× higher than those of the mobile GPU. Therefore, our technique realizes hardware-aware network with comparable accuracy.
ER -