The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Kertas kerja ini mencadangkan rangka kerja berasaskan rangkaian saraf dalam (DNN) untuk menangani masalah pengkuantitian vektor (VQ) untuk data berdimensi tinggi. Cabaran utama untuk menggunakan DNN kepada VQ ialah bagaimana untuk mengurangkan ralat pengekodan binari auto-pengekod apabila pengedaran unit pengekodan jauh daripada binari. Untuk menangani masalah ini, tiga kaedah penalaan halus telah diterima pakai: 1) menambah hingar Gaussian pada input lapisan pengekodan, 2) memaksa output lapisan pengekodan menjadi binari, 3) menambah istilah penalti bukan binari kepada fungsi kehilangan. Kaedah penalaan halus ini telah dinilai secara meluas pada pengkuantitian spektrum magnitud pertuturan. Keputusan menunjukkan bahawa setiap kaedah berguna untuk meningkatkan prestasi pengekodan. Apabila dilaksanakan untuk mengkuantifikasi spektrum pertuturan 968 dimensi menggunakan hanya 18-bit, rangka kerja VQ berasaskan DNN mencapai purata PESQ kira-kira 2.09, yang jauh melebihi keupayaan kaedah VQ konvensional.
JianFeng WU
Hangzhou Dianzi University
HuiBin QIN
Hangzhou Dianzi University
YongZhu HUA
Hangzhou Dianzi University
LiHuan SHAO
Hangzhou Dianzi University
Ji HU
Hangzhou Dianzi University
ShengYing YANG
Hangzhou Dianzi University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
JianFeng WU, HuiBin QIN, YongZhu HUA, LiHuan SHAO, Ji HU, ShengYing YANG, "Vector Quantization of High-Dimensional Speech Spectra Using Deep Neural Network" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 10, pp. 2047-2050, October 2019, doi: 10.1587/transinf.2019EDL8023.
Abstract: This paper proposes a deep neural network (DNN) based framework to address the problem of vector quantization (VQ) for high-dimensional data. The main challenge of applying DNN to VQ is how to reduce the binary coding error of the auto-encoder when the distribution of the coding units is far from binary. To address this problem, three fine-tuning methods have been adopted: 1) adding Gaussian noise to the input of the coding layer, 2) forcing the output of the coding layer to be binary, 3) adding a non-binary penalty term to the loss function. These fine-tuning methods have been extensively evaluated on quantizing speech magnitude spectra. The results demonstrated that each of the methods is useful for improving the coding performance. When implemented for quantizing 968-dimensional speech spectra using only 18-bit, the DNN-based VQ framework achieved an averaged PESQ of about 2.09, which is far beyond the capability of conventional VQ methods.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDL8023/_p
Salinan
@ARTICLE{e102-d_10_2047,
author={JianFeng WU, HuiBin QIN, YongZhu HUA, LiHuan SHAO, Ji HU, ShengYing YANG, },
journal={IEICE TRANSACTIONS on Information},
title={Vector Quantization of High-Dimensional Speech Spectra Using Deep Neural Network},
year={2019},
volume={E102-D},
number={10},
pages={2047-2050},
abstract={This paper proposes a deep neural network (DNN) based framework to address the problem of vector quantization (VQ) for high-dimensional data. The main challenge of applying DNN to VQ is how to reduce the binary coding error of the auto-encoder when the distribution of the coding units is far from binary. To address this problem, three fine-tuning methods have been adopted: 1) adding Gaussian noise to the input of the coding layer, 2) forcing the output of the coding layer to be binary, 3) adding a non-binary penalty term to the loss function. These fine-tuning methods have been extensively evaluated on quantizing speech magnitude spectra. The results demonstrated that each of the methods is useful for improving the coding performance. When implemented for quantizing 968-dimensional speech spectra using only 18-bit, the DNN-based VQ framework achieved an averaged PESQ of about 2.09, which is far beyond the capability of conventional VQ methods.},
keywords={},
doi={10.1587/transinf.2019EDL8023},
ISSN={1745-1361},
month={October},}
Salinan
TY - JOUR
TI - Vector Quantization of High-Dimensional Speech Spectra Using Deep Neural Network
T2 - IEICE TRANSACTIONS on Information
SP - 2047
EP - 2050
AU - JianFeng WU
AU - HuiBin QIN
AU - YongZhu HUA
AU - LiHuan SHAO
AU - Ji HU
AU - ShengYing YANG
PY - 2019
DO - 10.1587/transinf.2019EDL8023
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2019
AB - This paper proposes a deep neural network (DNN) based framework to address the problem of vector quantization (VQ) for high-dimensional data. The main challenge of applying DNN to VQ is how to reduce the binary coding error of the auto-encoder when the distribution of the coding units is far from binary. To address this problem, three fine-tuning methods have been adopted: 1) adding Gaussian noise to the input of the coding layer, 2) forcing the output of the coding layer to be binary, 3) adding a non-binary penalty term to the loss function. These fine-tuning methods have been extensively evaluated on quantizing speech magnitude spectra. The results demonstrated that each of the methods is useful for improving the coding performance. When implemented for quantizing 968-dimensional speech spectra using only 18-bit, the DNN-based VQ framework achieved an averaged PESQ of about 2.09, which is far beyond the capability of conventional VQ methods.
ER -