The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Dalam kertas kerja ini, kami mencadangkan teknik penyesuaian model pantas untuk pengecaman pertuturan emosi yang membolehkan kami mengekstrak maklumat paralinguistik serta maklumat linguistik yang terkandung dalam isyarat pertuturan. Teknik ini adalah berdasarkan anggaran gaya dan penyesuaian gaya menggunakan HMM regresi berganda (MRHMM). Dalam MRHMM, parameter min bagi fungsi ketumpatan kebarangkalian keluaran dikawal oleh vektor parameter berdimensi rendah, dipanggil vektor gaya, yang sepadan dengan satu set pembolehubah penjelasan regresi berbilang. Proses pengecaman terdiri daripada dua peringkat. Pada peringkat pertama, vektor gaya yang mewakili kategori ekspresi emosi dan keamatan ekspresinya untuk ucapan input dianggarkan berdasarkan ayat demi ayat. Seterusnya, model akustik disesuaikan menggunakan vektor gaya anggaran, dan kemudian pengecaman pertuturan berasaskan HMM standard dilakukan pada peringkat kedua. Kami menilai prestasi teknik yang dicadangkan dalam pengiktirafan ucapan emosi simulasi yang diucapkan oleh kedua-dua perawi profesional dan penceramah bukan profesional.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Yusuke IJIMA, Takashi NOSE, Makoto TACHIBANA, Takao KOBAYASHI, "A Rapid Model Adaptation Technique for Emotional Speech Recognition with Style Estimation Based on Multiple-Regression HMM" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 1, pp. 107-115, January 2010, doi: 10.1587/transinf.E93.D.107.
Abstract: In this paper, we propose a rapid model adaptation technique for emotional speech recognition which enables us to extract paralinguistic information as well as linguistic information contained in speech signals. This technique is based on style estimation and style adaptation using a multiple-regression HMM (MRHMM). In the MRHMM, the mean parameters of the output probability density function are controlled by a low-dimensional parameter vector, called a style vector, which corresponds to a set of the explanatory variables of the multiple regression. The recognition process consists of two stages. In the first stage, the style vector that represents the emotional expression category and the intensity of its expressiveness for the input speech is estimated on a sentence-by-sentence basis. Next, the acoustic models are adapted using the estimated style vector, and then standard HMM-based speech recognition is performed in the second stage. We assess the performance of the proposed technique in the recognition of simulated emotional speech uttered by both professional narrators and non-professional speakers.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.107/_p
Salinan
@ARTICLE{e93-d_1_107,
author={Yusuke IJIMA, Takashi NOSE, Makoto TACHIBANA, Takao KOBAYASHI, },
journal={IEICE TRANSACTIONS on Information},
title={A Rapid Model Adaptation Technique for Emotional Speech Recognition with Style Estimation Based on Multiple-Regression HMM},
year={2010},
volume={E93-D},
number={1},
pages={107-115},
abstract={In this paper, we propose a rapid model adaptation technique for emotional speech recognition which enables us to extract paralinguistic information as well as linguistic information contained in speech signals. This technique is based on style estimation and style adaptation using a multiple-regression HMM (MRHMM). In the MRHMM, the mean parameters of the output probability density function are controlled by a low-dimensional parameter vector, called a style vector, which corresponds to a set of the explanatory variables of the multiple regression. The recognition process consists of two stages. In the first stage, the style vector that represents the emotional expression category and the intensity of its expressiveness for the input speech is estimated on a sentence-by-sentence basis. Next, the acoustic models are adapted using the estimated style vector, and then standard HMM-based speech recognition is performed in the second stage. We assess the performance of the proposed technique in the recognition of simulated emotional speech uttered by both professional narrators and non-professional speakers.},
keywords={},
doi={10.1587/transinf.E93.D.107},
ISSN={1745-1361},
month={January},}
Salinan
TY - JOUR
TI - A Rapid Model Adaptation Technique for Emotional Speech Recognition with Style Estimation Based on Multiple-Regression HMM
T2 - IEICE TRANSACTIONS on Information
SP - 107
EP - 115
AU - Yusuke IJIMA
AU - Takashi NOSE
AU - Makoto TACHIBANA
AU - Takao KOBAYASHI
PY - 2010
DO - 10.1587/transinf.E93.D.107
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2010
AB - In this paper, we propose a rapid model adaptation technique for emotional speech recognition which enables us to extract paralinguistic information as well as linguistic information contained in speech signals. This technique is based on style estimation and style adaptation using a multiple-regression HMM (MRHMM). In the MRHMM, the mean parameters of the output probability density function are controlled by a low-dimensional parameter vector, called a style vector, which corresponds to a set of the explanatory variables of the multiple regression. The recognition process consists of two stages. In the first stage, the style vector that represents the emotional expression category and the intensity of its expressiveness for the input speech is estimated on a sentence-by-sentence basis. Next, the acoustic models are adapted using the estimated style vector, and then standard HMM-based speech recognition is performed in the second stage. We assess the performance of the proposed technique in the recognition of simulated emotional speech uttered by both professional narrators and non-professional speakers.
ER -