The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Makalah ini mencadangkan sistem pengesahan ujaran menggunakan nisbah kemungkinan log peringkat negeri dengan pemilihan bingkai dan keadaan. Kami menggunakan model Markov tersembunyi untuk pengecaman pertuturan dan pengesahan sebutan sebagai model akustik dan model anti telefon. Model Markov tersembunyi mempunyai tiga keadaan dan setiap negeri mewakili ciri telefon yang berbeza. Oleh itu, kami mencadangkan algoritma untuk mengira nisbah kemungkinan log peringkat negeri dan memberi pemberat pada keadaan untuk mendapatkan ukuran keyakinan yang lebih dipercayai bagi telefon yang diiktiraf. Selain itu, kami mencadangkan algoritma pemilihan bingkai untuk mengira ukuran keyakinan pada bingkai termasuk pertuturan yang betul dalam pertuturan input. Secara amnya, maklumat pembahagian telefon yang diperoleh daripada sistem pengecaman pertuturan bebas pembesar suara adalah tidak tepat kerana model akustik berasaskan trifon sukar untuk dilatih dengan berkesan untuk meliputi pelbagai sebutan dan kesan koartikulasi. Jadi, adalah lebih sukar untuk mencari keadaan dipadankan yang betul apabila mendapatkan maklumat pembahagian keadaan. Algoritma pemilihan keadaan dicadangkan untuk mencari keadaan yang sah. Kaedah yang dicadangkan menggunakan nisbah kemungkinan log peringkat negeri dengan pemilihan bingkai dan keadaan menunjukkan bahawa pengurangan relatif dalam kadar ralat yang sama ialah 18.1% berbanding sistem garis dasar menggunakan nisbah kemungkinan log peringkat telefon mudah.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Suk-Bong KWON, Hoirin KIM, "Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 3, pp. 647-650, March 2010, doi: 10.1587/transinf.E93.D.647.
Abstract: This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.647/_p
Salinan
@ARTICLE{e93-d_3_647,
author={Suk-Bong KWON, Hoirin KIM, },
journal={IEICE TRANSACTIONS on Information},
title={Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection},
year={2010},
volume={E93-D},
number={3},
pages={647-650},
abstract={This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.},
keywords={},
doi={10.1587/transinf.E93.D.647},
ISSN={1745-1361},
month={March},}
Salinan
TY - JOUR
TI - Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection
T2 - IEICE TRANSACTIONS on Information
SP - 647
EP - 650
AU - Suk-Bong KWON
AU - Hoirin KIM
PY - 2010
DO - 10.1587/transinf.E93.D.647
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2010
AB - This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.
ER -