The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Rangkaian saraf telah menjadi salah satu teknik yang paling berguna dalam bidang pengecaman pertuturan, terjemahan bahasa dan analisis imej dalam beberapa tahun kebelakangan ini. Memori Jangka Pendek Panjang (LSTM), sejenis rangkaian neural berulang (RNN) yang popular, telah dilaksanakan secara meluas pada CPU dan GPU. Walau bagaimanapun, pelaksanaan perisian tersebut menawarkan keselarian yang lemah manakala pelaksanaan perkakasan sedia ada tidak mempunyai kebolehkonfigurasian. Untuk mengimbangi jurang ini, pelaksanaan perkakasan 7.62 GOP/s yang sangat boleh dikonfigurasikan untuk LSTM dicadangkan dalam kertas ini. Untuk mencapai matlamat, aliran kerja disusun dengan teliti untuk menjadikan reka bentuk padat dan daya pemprosesan tinggi; struktur disusun dengan teliti untuk membuat reka bentuk boleh dikonfigurasikan; strategi penimbalan dan pemampatan data dipilih dengan teliti untuk menurunkan lebar jalur tanpa meningkatkan kerumitan struktur; jenis data, fungsi sigmoid logistik (σ) dan fungsi tangen hiperbolik (tanh) dioptimumkan dengan teliti untuk mengimbangi kos dan ketepatan perkakasan. Kerja ini mencapai prestasi 7.62 GOP/s @ 238 MHz pada XCZU6EG FPGA, yang hanya memerlukan jadual carian 3K (LUT). Berbanding dengan pelaksanaan pada Intel Xeon E5-2620 CPU @ 2.10GHz, kerja ini mencapai kira-kira 90x kelajuan untuk rangkaian kecil dan 25x kelajuan untuk yang besar. Penggunaan sumber juga jauh lebih rendah daripada kerja-kerja terkini.
Yibo FAN
Fudan University
Leilei HUANG
Fudan University
Kewei CHEN
Fudan University
Xiaoyang ZENG
Fudan University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Yibo FAN, Leilei HUANG, Kewei CHEN, Xiaoyang ZENG, "A Highly Configurable 7.62GOP/s Hardware Implementation for LSTM" in IEICE TRANSACTIONS on Electronics,
vol. E103-C, no. 5, pp. 263-273, May 2020, doi: 10.1587/transele.2019ECP5008.
Abstract: The neural network has been one of the most useful techniques in the area of speech recognition, language translation and image analysis in recent years. Long Short-Term Memory (LSTM), a popular type of recurrent neural networks (RNNs), has been widely implemented on CPUs and GPUs. However, those software implementations offer a poor parallelism while the existing hardware implementations lack in configurability. In order to make up for this gap, a highly configurable 7.62 GOP/s hardware implementation for LSTM is proposed in this paper. To achieve the goal, the work flow is carefully arranged to make the design compact and high-throughput; the structure is carefully organized to make the design configurable; the data buffering and compression strategy is carefully chosen to lower the bandwidth without increasing the complexity of structure; the data type, logistic sigmoid (σ) function and hyperbolic tangent (tanh) function is carefully optimized to balance the hardware cost and accuracy. This work achieves a performance of 7.62 GOP/s @ 238 MHz on XCZU6EG FPGA, which takes only 3K look-up table (LUT). Compared with the implementation on Intel Xeon E5-2620 CPU @ 2.10GHz, this work achieves about 90× speedup for small networks and 25× speed-up for large ones. The consumption of resources is also much less than that of the state-of-the-art works.
URL: https://global.ieice.org/en_transactions/electronics/10.1587/transele.2019ECP5008/_p
Salinan
@ARTICLE{e103-c_5_263,
author={Yibo FAN, Leilei HUANG, Kewei CHEN, Xiaoyang ZENG, },
journal={IEICE TRANSACTIONS on Electronics},
title={A Highly Configurable 7.62GOP/s Hardware Implementation for LSTM},
year={2020},
volume={E103-C},
number={5},
pages={263-273},
abstract={The neural network has been one of the most useful techniques in the area of speech recognition, language translation and image analysis in recent years. Long Short-Term Memory (LSTM), a popular type of recurrent neural networks (RNNs), has been widely implemented on CPUs and GPUs. However, those software implementations offer a poor parallelism while the existing hardware implementations lack in configurability. In order to make up for this gap, a highly configurable 7.62 GOP/s hardware implementation for LSTM is proposed in this paper. To achieve the goal, the work flow is carefully arranged to make the design compact and high-throughput; the structure is carefully organized to make the design configurable; the data buffering and compression strategy is carefully chosen to lower the bandwidth without increasing the complexity of structure; the data type, logistic sigmoid (σ) function and hyperbolic tangent (tanh) function is carefully optimized to balance the hardware cost and accuracy. This work achieves a performance of 7.62 GOP/s @ 238 MHz on XCZU6EG FPGA, which takes only 3K look-up table (LUT). Compared with the implementation on Intel Xeon E5-2620 CPU @ 2.10GHz, this work achieves about 90× speedup for small networks and 25× speed-up for large ones. The consumption of resources is also much less than that of the state-of-the-art works.},
keywords={},
doi={10.1587/transele.2019ECP5008},
ISSN={1745-1353},
month={May},}
Salinan
TY - JOUR
TI - A Highly Configurable 7.62GOP/s Hardware Implementation for LSTM
T2 - IEICE TRANSACTIONS on Electronics
SP - 263
EP - 273
AU - Yibo FAN
AU - Leilei HUANG
AU - Kewei CHEN
AU - Xiaoyang ZENG
PY - 2020
DO - 10.1587/transele.2019ECP5008
JO - IEICE TRANSACTIONS on Electronics
SN - 1745-1353
VL - E103-C
IS - 5
JA - IEICE TRANSACTIONS on Electronics
Y1 - May 2020
AB - The neural network has been one of the most useful techniques in the area of speech recognition, language translation and image analysis in recent years. Long Short-Term Memory (LSTM), a popular type of recurrent neural networks (RNNs), has been widely implemented on CPUs and GPUs. However, those software implementations offer a poor parallelism while the existing hardware implementations lack in configurability. In order to make up for this gap, a highly configurable 7.62 GOP/s hardware implementation for LSTM is proposed in this paper. To achieve the goal, the work flow is carefully arranged to make the design compact and high-throughput; the structure is carefully organized to make the design configurable; the data buffering and compression strategy is carefully chosen to lower the bandwidth without increasing the complexity of structure; the data type, logistic sigmoid (σ) function and hyperbolic tangent (tanh) function is carefully optimized to balance the hardware cost and accuracy. This work achieves a performance of 7.62 GOP/s @ 238 MHz on XCZU6EG FPGA, which takes only 3K look-up table (LUT). Compared with the implementation on Intel Xeon E5-2620 CPU @ 2.10GHz, this work achieves about 90× speedup for small networks and 25× speed-up for large ones. The consumption of resources is also much less than that of the state-of-the-art works.
ER -