The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Untuk sistem navigasi kereta berdaya suara yang menggunakan perkhidmatan pengecaman pertuturan awan pelbagai guna (cloud ASR), klasifikasi sebutan yang teguh terhadap ralat pengecaman pertuturan diperlukan untuk merealisasikan antara muka suara yang mesra pengguna. Tujuan kajian ini adalah untuk meningkatkan ketepatan pengelasan sebutan untuk sistem navigasi kereta yang didayakan suara apabila input kepada pengelas adalah hasil pengecaman pertuturan yang terdedah kepada ralat yang diperoleh daripada ASR awan. Peranan klasifikasi ujaran adalah untuk meramalkan fungsi navigasi kereta yang ingin dilaksanakan oleh pengguna daripada ujaran spontan. ASR awan menyebabkan ralat pengecaman pertuturan disebabkan oleh bunyi yang berlaku semasa perjalanan di dalam kereta, dan ralat itu merendahkan ketepatan klasifikasi sebutan. Terdapat banyak kaedah untuk mengurangkan bilangan ralat pengecaman pertuturan dengan mengubah suai bahagian dalam pengecam pertuturan. Walau bagaimanapun, pembangun aplikasi tidak boleh menggunakan kaedah ini pada awan ASR kerana mereka tidak boleh menyesuaikan ASR. Dalam makalah ini, kami mencadangkan sistem untuk meningkatkan ketepatan klasifikasi sebutan dengan mengubah suai kedua-dua input isyarat pertuturan kepada ASR awan dan output ayat yang diiktiraf daripada ASR. Pertama, sistem kami melakukan peningkatan pertuturan pada ujaran pengguna dan kemudian menghantar kedua-dua isyarat pertuturan yang dipertingkatkan dan tidak dipertingkatkan kepada ASR awan. Hasil pengecaman pertuturan daripada kedua-dua isyarat pertuturan digabungkan untuk mengurangkan bilangan ralat pengecaman. Kedua, untuk mengurangkan ralat klasifikasi ujaran, kami mencadangkan kaedah penambahan data, yang kami panggil "doping optimum," di mana bukan sahaja transkripsi yang tepat tetapi juga ayat yang dikenal pasti ralat ditambahkan pada data latihan. Penilaian dengan sebutan pengguna sebenar yang dituturkan kepada produk navigasi kereta menunjukkan bahawa sistem kami mengurangkan bilangan ralat klasifikasi sebutan sebanyak 54% daripada keadaan garis dasar. Akhir sekali, kami mencadangkan pendekatan peningkatan separa automatik untuk pengelas untuk mendapat manfaat daripada prestasi ASR awan yang lebih baik.
Takeshi HOMMA
Hitachi, Ltd.
Yasunari OBUCHI
Tokyo University of Technology
Kazuaki SHIMA
Clarion Co., Ltd.
Rintaro IKESHITA
Hitachi, Ltd.
Hiroaki KOKUBO
Hitachi, Ltd.
Takuya MATSUMOTO
Hitachi Automotive Systems Ltd.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Takeshi HOMMA, Yasunari OBUCHI, Kazuaki SHIMA, Rintaro IKESHITA, Hiroaki KOKUBO, Takuya MATSUMOTO, "In-Vehicle Voice Interface with Improved Utterance Classification Accuracy Using Off-the-Shelf Cloud Speech Recognizer" in IEICE TRANSACTIONS on Information,
vol. E101-D, no. 12, pp. 3123-3137, December 2018, doi: 10.1587/transinf.2018EDK0001.
Abstract: For voice-enabled car navigation systems that use a multi-purpose cloud speech recognition service (cloud ASR), utterance classification that is robust against speech recognition errors is needed to realize a user-friendly voice interface. The purpose of this study is to improve the accuracy of utterance classification for voice-enabled car navigation systems when inputs to a classifier are error-prone speech recognition results obtained from a cloud ASR. The role of utterance classification is to predict which car navigation function a user wants to execute from a spontaneous utterance. A cloud ASR causes speech recognition errors due to the noises that occur when traveling in a car, and the errors degrade the accuracy of utterance classification. There are many methods for reducing the number of speech recognition errors by modifying the inside of a speech recognizer. However, application developers cannot apply these methods to cloud ASRs because they cannot customize the ASRs. In this paper, we propose a system for improving the accuracy of utterance classification by modifying both speech-signal inputs to a cloud ASR and recognized-sentence outputs from an ASR. First, our system performs speech enhancement on a user's utterance and then sends both enhanced and non-enhanced speech signals to a cloud ASR. Speech recognition results from both speech signals are merged to reduce the number of recognition errors. Second, to reduce that of utterance classification errors, we propose a data augmentation method, which we call “optimal doping,” where not only accurate transcriptions but also error-prone recognized sentences are added to training data. An evaluation with real user utterances spoken to car navigation products showed that our system reduces the number of utterance classification errors by 54% from a baseline condition. Finally, we propose a semi-automatic upgrading approach for classifiers to benefit from the improved performance of cloud ASRs.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDK0001/_p
Salinan
@ARTICLE{e101-d_12_3123,
author={Takeshi HOMMA, Yasunari OBUCHI, Kazuaki SHIMA, Rintaro IKESHITA, Hiroaki KOKUBO, Takuya MATSUMOTO, },
journal={IEICE TRANSACTIONS on Information},
title={In-Vehicle Voice Interface with Improved Utterance Classification Accuracy Using Off-the-Shelf Cloud Speech Recognizer},
year={2018},
volume={E101-D},
number={12},
pages={3123-3137},
abstract={For voice-enabled car navigation systems that use a multi-purpose cloud speech recognition service (cloud ASR), utterance classification that is robust against speech recognition errors is needed to realize a user-friendly voice interface. The purpose of this study is to improve the accuracy of utterance classification for voice-enabled car navigation systems when inputs to a classifier are error-prone speech recognition results obtained from a cloud ASR. The role of utterance classification is to predict which car navigation function a user wants to execute from a spontaneous utterance. A cloud ASR causes speech recognition errors due to the noises that occur when traveling in a car, and the errors degrade the accuracy of utterance classification. There are many methods for reducing the number of speech recognition errors by modifying the inside of a speech recognizer. However, application developers cannot apply these methods to cloud ASRs because they cannot customize the ASRs. In this paper, we propose a system for improving the accuracy of utterance classification by modifying both speech-signal inputs to a cloud ASR and recognized-sentence outputs from an ASR. First, our system performs speech enhancement on a user's utterance and then sends both enhanced and non-enhanced speech signals to a cloud ASR. Speech recognition results from both speech signals are merged to reduce the number of recognition errors. Second, to reduce that of utterance classification errors, we propose a data augmentation method, which we call “optimal doping,” where not only accurate transcriptions but also error-prone recognized sentences are added to training data. An evaluation with real user utterances spoken to car navigation products showed that our system reduces the number of utterance classification errors by 54% from a baseline condition. Finally, we propose a semi-automatic upgrading approach for classifiers to benefit from the improved performance of cloud ASRs.},
keywords={},
doi={10.1587/transinf.2018EDK0001},
ISSN={1745-1361},
month={December},}
Salinan
TY - JOUR
TI - In-Vehicle Voice Interface with Improved Utterance Classification Accuracy Using Off-the-Shelf Cloud Speech Recognizer
T2 - IEICE TRANSACTIONS on Information
SP - 3123
EP - 3137
AU - Takeshi HOMMA
AU - Yasunari OBUCHI
AU - Kazuaki SHIMA
AU - Rintaro IKESHITA
AU - Hiroaki KOKUBO
AU - Takuya MATSUMOTO
PY - 2018
DO - 10.1587/transinf.2018EDK0001
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2018
AB - For voice-enabled car navigation systems that use a multi-purpose cloud speech recognition service (cloud ASR), utterance classification that is robust against speech recognition errors is needed to realize a user-friendly voice interface. The purpose of this study is to improve the accuracy of utterance classification for voice-enabled car navigation systems when inputs to a classifier are error-prone speech recognition results obtained from a cloud ASR. The role of utterance classification is to predict which car navigation function a user wants to execute from a spontaneous utterance. A cloud ASR causes speech recognition errors due to the noises that occur when traveling in a car, and the errors degrade the accuracy of utterance classification. There are many methods for reducing the number of speech recognition errors by modifying the inside of a speech recognizer. However, application developers cannot apply these methods to cloud ASRs because they cannot customize the ASRs. In this paper, we propose a system for improving the accuracy of utterance classification by modifying both speech-signal inputs to a cloud ASR and recognized-sentence outputs from an ASR. First, our system performs speech enhancement on a user's utterance and then sends both enhanced and non-enhanced speech signals to a cloud ASR. Speech recognition results from both speech signals are merged to reduce the number of recognition errors. Second, to reduce that of utterance classification errors, we propose a data augmentation method, which we call “optimal doping,” where not only accurate transcriptions but also error-prone recognized sentences are added to training data. An evaluation with real user utterances spoken to car navigation products showed that our system reduces the number of utterance classification errors by 54% from a baseline condition. Finally, we propose a semi-automatic upgrading approach for classifiers to benefit from the improved performance of cloud ASRs.
ER -