The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Pengecaman pertuturan emosi secara amnya dianggap lebih sukar daripada pengecaman pertuturan bukan emosi. Ciri-ciri akustik pertuturan emosi berbeza daripada pertuturan bukan emosi. Selain itu, ciri akustik berbeza dengan ketara bergantung pada jenis dan keamatan emosi. Mengenai ciri linguistik, ungkapan emosi dan bahasa sehari-hari juga diperhatikan dalam ujaran mereka. Untuk menyelesaikan masalah ini, kami berhasrat untuk meningkatkan prestasi pengecaman dengan menyesuaikan model akustik dan bahasa kepada pertuturan emosi. Kami menggunakan Ucapan Emosi (JTES) berasaskan Twitter Jepun sebagai korpus ucapan emosi. Korpus ini terdiri daripada tweet dan mempunyai label emosi yang diberikan kepada setiap ujaran. Adaptasi korpus boleh dilakukan menggunakan ujaran yang terkandung dalam korpus ini. Walau bagaimanapun, mengenai model bahasa, jumlah data penyesuaian tidak mencukupi. Untuk menyelesaikan masalah ini, kami mencadangkan penyesuaian model bahasa dengan menggunakan data tweet dalam talian yang dimuat turun dari internet. Ayat yang digunakan untuk penyesuaian telah diekstrak daripada data tweet berdasarkan peraturan tertentu. Kami mengekstrak data sebanyak 25.86 M perkataan dan menggunakannya untuk penyesuaian. Dalam eksperimen pengecaman, kadar ralat kata dasar ialah 36.11%, manakala dengan penyesuaian model akustik dan bahasa ialah 17.77%. Keputusan menunjukkan keberkesanan kaedah yang dicadangkan.
Tetsuo KOSAKA
Yamagata University
Kazuya SAEKI
Yamagata University
Yoshitaka AIZAWA
Yamagata University
Masaharu KATO
Yamagata University
Takashi NOSE
Tohoku University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Tetsuo KOSAKA, Kazuya SAEKI, Yoshitaka AIZAWA, Masaharu KATO, Takashi NOSE, "Simultaneous Adaptation of Acoustic and Language Models for Emotional Speech Recognition Using Tweet Data" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 3, pp. 363-373, March 2024, doi: 10.1587/transinf.2023HCP0010.
Abstract: Emotional speech recognition is generally considered more difficult than non-emotional speech recognition. The acoustic characteristics of emotional speech differ from those of non-emotional speech. Additionally, acoustic characteristics vary significantly depending on the type and intensity of emotions. Regarding linguistic features, emotional and colloquial expressions are also observed in their utterances. To solve these problems, we aim to improve recognition performance by adapting acoustic and language models to emotional speech. We used Japanese Twitter-based Emotional Speech (JTES) as an emotional speech corpus. This corpus consisted of tweets and had an emotional label assigned to each utterance. Corpus adaptation is possible using the utterances contained in this corpus. However, regarding the language model, the amount of adaptation data is insufficient. To solve this problem, we propose an adaptation of the language model by using online tweet data downloaded from the internet. The sentences used for adaptation were extracted from the tweet data based on certain rules. We extracted the data of 25.86 M words and used them for adaptation. In the recognition experiments, the baseline word error rate was 36.11%, whereas that with the acoustic and language model adaptation was 17.77%. The results demonstrated the effectiveness of the proposed method.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023HCP0010/_p
Salinan
@ARTICLE{e107-d_3_363,
author={Tetsuo KOSAKA, Kazuya SAEKI, Yoshitaka AIZAWA, Masaharu KATO, Takashi NOSE, },
journal={IEICE TRANSACTIONS on Information},
title={Simultaneous Adaptation of Acoustic and Language Models for Emotional Speech Recognition Using Tweet Data},
year={2024},
volume={E107-D},
number={3},
pages={363-373},
abstract={Emotional speech recognition is generally considered more difficult than non-emotional speech recognition. The acoustic characteristics of emotional speech differ from those of non-emotional speech. Additionally, acoustic characteristics vary significantly depending on the type and intensity of emotions. Regarding linguistic features, emotional and colloquial expressions are also observed in their utterances. To solve these problems, we aim to improve recognition performance by adapting acoustic and language models to emotional speech. We used Japanese Twitter-based Emotional Speech (JTES) as an emotional speech corpus. This corpus consisted of tweets and had an emotional label assigned to each utterance. Corpus adaptation is possible using the utterances contained in this corpus. However, regarding the language model, the amount of adaptation data is insufficient. To solve this problem, we propose an adaptation of the language model by using online tweet data downloaded from the internet. The sentences used for adaptation were extracted from the tweet data based on certain rules. We extracted the data of 25.86 M words and used them for adaptation. In the recognition experiments, the baseline word error rate was 36.11%, whereas that with the acoustic and language model adaptation was 17.77%. The results demonstrated the effectiveness of the proposed method.},
keywords={},
doi={10.1587/transinf.2023HCP0010},
ISSN={1745-1361},
month={March},}
Salinan
TY - JOUR
TI - Simultaneous Adaptation of Acoustic and Language Models for Emotional Speech Recognition Using Tweet Data
T2 - IEICE TRANSACTIONS on Information
SP - 363
EP - 373
AU - Tetsuo KOSAKA
AU - Kazuya SAEKI
AU - Yoshitaka AIZAWA
AU - Masaharu KATO
AU - Takashi NOSE
PY - 2024
DO - 10.1587/transinf.2023HCP0010
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2024
AB - Emotional speech recognition is generally considered more difficult than non-emotional speech recognition. The acoustic characteristics of emotional speech differ from those of non-emotional speech. Additionally, acoustic characteristics vary significantly depending on the type and intensity of emotions. Regarding linguistic features, emotional and colloquial expressions are also observed in their utterances. To solve these problems, we aim to improve recognition performance by adapting acoustic and language models to emotional speech. We used Japanese Twitter-based Emotional Speech (JTES) as an emotional speech corpus. This corpus consisted of tweets and had an emotional label assigned to each utterance. Corpus adaptation is possible using the utterances contained in this corpus. However, regarding the language model, the amount of adaptation data is insufficient. To solve this problem, we propose an adaptation of the language model by using online tweet data downloaded from the internet. The sentences used for adaptation were extracted from the tweet data based on certain rules. We extracted the data of 25.86 M words and used them for adaptation. In the recognition experiments, the baseline word error rate was 36.11%, whereas that with the acoustic and language model adaptation was 17.77%. The results demonstrated the effectiveness of the proposed method.
ER -