The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Makalah ini menunjukkan model bahasa rangkaian saraf berulang perkataan terpendam (LW-RNN-LMs) untuk meningkatkan pengecaman pertuturan automatik (ASR). LW-RNN-LMs dibina untuk mendapatkan kelebihan dalam kedua-dua model bahasa rangkaian saraf berulang (RNN-LMs) dan model bahasa perkataan terpendam (LW-LMs). RNN-LM boleh menangkap maklumat konteks jarak jauh dan menawarkan prestasi yang kukuh, dan LW-LM adalah teguh untuk tugas di luar domain berdasarkan pemodelan ruang perkataan terpendam. Walau bagaimanapun, RNN-LMs tidak dapat menangkap secara eksplisit perhubungan tersembunyi di sebalik perkataan yang diperhatikan kerana konsep ruang pembolehubah terpendam tidak ada. Di samping itu, LW-LM tidak boleh mengambil kira hubungan jarak jauh antara perkataan terpendam. Idea kami adalah untuk menggabungkan RNN-LM dan LW-LM untuk mengimbangi kelemahan individu. LW-RNN-LM boleh menyokong kedua-dua pemodelan ruang pembolehubah terpendam serta LW-LM dan pemodelan perhubungan jarak jauh serta RNN-LM pada masa yang sama. Dari sudut pandangan RNN-LM, LW-RNN-LM boleh dianggap sebagai RNN-LM kelas lembut dengan ruang pembolehubah pendam yang luas. Sebaliknya, dari sudut pandangan LW-LMs, LW-RNN-LM boleh dianggap sebagai LW-LM yang menggunakan struktur RNN untuk pemodelan pembolehubah pendam dan bukannya struktur n-gram. Kertas ini juga memperincikan kaedah inferens parameter dan dua jenis kaedah pelaksanaan, penghampiran n-gram dan penghampiran Viterbi, untuk memperkenalkan LW-LM kepada ASR. Eksperimen kami menunjukkan keberkesanan LW-RNN-LMs pada penilaian kebingungan untuk korpus Penn Treebank dan penilaian ASR untuk tugas pertuturan spontan Jepun.
Ryo MASUMURA
NTT Corporation
Taichi ASAMI
NTT Corporation
Takanobu OBA
NTT Corporation
Sumitaka SAKAUCHI
NTT Corporation
Akinori ITO
Tohoku University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Ryo MASUMURA, Taichi ASAMI, Takanobu OBA, Sumitaka SAKAUCHI, Akinori ITO, "Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 12, pp. 2557-2567, December 2019, doi: 10.1587/transinf.2018EDP7242.
Abstract: This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDP7242/_p
Salinan
@ARTICLE{e102-d_12_2557,
author={Ryo MASUMURA, Taichi ASAMI, Takanobu OBA, Sumitaka SAKAUCHI, Akinori ITO, },
journal={IEICE TRANSACTIONS on Information},
title={Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition},
year={2019},
volume={E102-D},
number={12},
pages={2557-2567},
abstract={This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.},
keywords={},
doi={10.1587/transinf.2018EDP7242},
ISSN={1745-1361},
month={December},}
Salinan
TY - JOUR
TI - Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 2557
EP - 2567
AU - Ryo MASUMURA
AU - Taichi ASAMI
AU - Takanobu OBA
AU - Sumitaka SAKAUCHI
AU - Akinori ITO
PY - 2019
DO - 10.1587/transinf.2018EDP7242
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2019
AB - This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.
ER -