The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Untuk meningkatkan keteguhan hingar pengecaman pembesar suara automatik, banyak teknik mengenai peningkatan pertuturan/ciri telah diterokai dengan menggunakan rangkaian saraf dalam (DNN). Dalam kerja ini, peningkatan pelbagai peringkat DNN (DNN-ME), yang terdiri daripada peringkat peningkatan isyarat, peningkatan sepstrum dan peningkatan i-vektor, dicadangkan untuk pengecaman pembesar suara bebas teks. Memandangkan fakta bahawa kaedah peningkatan ini digunakan dalam pelbagai peringkat saluran paip pengecaman pembesar suara, adalah wajar untuk meneroka peranan pelengkap kaedah ini, yang memberi manfaat kepada pemahaman kebaikan dan keburukan peningkatan peringkat yang berbeza. Untuk menggunakan keupayaan DNN-ME sebanyak mungkin, dua jenis kaedah yang dipanggil Cascaded DNN-ME dan input bersama DNNs dikaji. Model campuran Gaussian berwajaran (WGMM) yang dicadangkan dalam kerja kami sebelum ini juga digunakan untuk meningkatkan lagi prestasi model. Eksperimen yang dijalankan ke atas pangkalan data Speakers in the Wild (SITW) telah menunjukkan bahawa DNN-ME menunjukkan keunggulan yang ketara berbanding sistem dengan hanya satu peningkatan untuk pengecaman pembesar suara teguh hingar. Berbanding dengan garis dasar i-vektor, kadar ralat yang sama (EER) dikurangkan daripada 5.75 kepada 4.01.
Xingyu ZHANG
Army Engineering University
Xia ZOU
Army Engineering University
Meng SUN
Army Engineering University
Penglong WU
Army Engineering University
Yimin WANG
Army Engineering University
Jun HE
National University of Defense Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Xingyu ZHANG, Xia ZOU, Meng SUN, Penglong WU, Yimin WANG, Jun HE, "On the Complementary Role of DNN Multi-Level Enhancement for Noisy Robust Speaker Recognition in an I-Vector Framework" in IEICE TRANSACTIONS on Fundamentals,
vol. E103-A, no. 1, pp. 356-360, January 2020, doi: 10.1587/transfun.2019EAL2104.
Abstract: In order to improve the noise robustness of automatic speaker recognition, many techniques on speech/feature enhancement have been explored by using deep neural networks (DNN). In this work, a DNN multi-level enhancement (DNN-ME), which consists of the stages of signal enhancement, cepstrum enhancement and i-vector enhancement, is proposed for text-independent speaker recognition. Given the fact that these enhancement methods are applied in different stages of the speaker recognition pipeline, it is worth exploring the complementary role of these methods, which benefits the understanding of the pros and cons of the enhancements of different stages. In order to use the capabilities of DNN-ME as much as possible, two kinds of methods called Cascaded DNN-ME and joint input of DNNs are studied. Weighted Gaussian mixture models (WGMMs) proposed in our previous work is also applied to further improve the model's performance. Experiments conducted on the Speakers in the Wild (SITW) database have shown that DNN-ME demonstrated significant superiority over the systems with only a single enhancement for noise robust speaker recognition. Compared with the i-vector baseline, the equal error rate (EER) was reduced from 5.75 to 4.01.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2019EAL2104/_p
Salinan
@ARTICLE{e103-a_1_356,
author={Xingyu ZHANG, Xia ZOU, Meng SUN, Penglong WU, Yimin WANG, Jun HE, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={On the Complementary Role of DNN Multi-Level Enhancement for Noisy Robust Speaker Recognition in an I-Vector Framework},
year={2020},
volume={E103-A},
number={1},
pages={356-360},
abstract={In order to improve the noise robustness of automatic speaker recognition, many techniques on speech/feature enhancement have been explored by using deep neural networks (DNN). In this work, a DNN multi-level enhancement (DNN-ME), which consists of the stages of signal enhancement, cepstrum enhancement and i-vector enhancement, is proposed for text-independent speaker recognition. Given the fact that these enhancement methods are applied in different stages of the speaker recognition pipeline, it is worth exploring the complementary role of these methods, which benefits the understanding of the pros and cons of the enhancements of different stages. In order to use the capabilities of DNN-ME as much as possible, two kinds of methods called Cascaded DNN-ME and joint input of DNNs are studied. Weighted Gaussian mixture models (WGMMs) proposed in our previous work is also applied to further improve the model's performance. Experiments conducted on the Speakers in the Wild (SITW) database have shown that DNN-ME demonstrated significant superiority over the systems with only a single enhancement for noise robust speaker recognition. Compared with the i-vector baseline, the equal error rate (EER) was reduced from 5.75 to 4.01.},
keywords={},
doi={10.1587/transfun.2019EAL2104},
ISSN={1745-1337},
month={January},}
Salinan
TY - JOUR
TI - On the Complementary Role of DNN Multi-Level Enhancement for Noisy Robust Speaker Recognition in an I-Vector Framework
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 356
EP - 360
AU - Xingyu ZHANG
AU - Xia ZOU
AU - Meng SUN
AU - Penglong WU
AU - Yimin WANG
AU - Jun HE
PY - 2020
DO - 10.1587/transfun.2019EAL2104
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E103-A
IS - 1
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - January 2020
AB - In order to improve the noise robustness of automatic speaker recognition, many techniques on speech/feature enhancement have been explored by using deep neural networks (DNN). In this work, a DNN multi-level enhancement (DNN-ME), which consists of the stages of signal enhancement, cepstrum enhancement and i-vector enhancement, is proposed for text-independent speaker recognition. Given the fact that these enhancement methods are applied in different stages of the speaker recognition pipeline, it is worth exploring the complementary role of these methods, which benefits the understanding of the pros and cons of the enhancements of different stages. In order to use the capabilities of DNN-ME as much as possible, two kinds of methods called Cascaded DNN-ME and joint input of DNNs are studied. Weighted Gaussian mixture models (WGMMs) proposed in our previous work is also applied to further improve the model's performance. Experiments conducted on the Speakers in the Wild (SITW) database have shown that DNN-ME demonstrated significant superiority over the systems with only a single enhancement for noise robust speaker recognition. Compared with the i-vector baseline, the equal error rate (EER) was reduced from 5.75 to 4.01.
ER -