The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Kaedah pengesanan aktiviti suara (VAD) dalam talian baharu dicadangkan. Kaedah ini adalah berdasarkan ciri yang diperoleh daripada statistik pesanan tinggi (HOS), dipertingkatkan dengan metrik kedua berdasarkan puncak autokorelasi ternormal untuk meningkatkan keteguhannya kepada bunyi bukan Gaussian. Ciri ini juga berorientasikan untuk mendiskriminasi antara percakapan dekat dan pertuturan jauh, dengan itu menyediakan kaedah VAD dalam konteks interaksi manusia dengan manusia yang bebas daripada tahap tenaga. Klasifikasi dilakukan oleh variasi dalam talian bagi algoritma Expectation-Maximization (EM), untuk mengesan dan menyesuaikan diri dengan variasi hingar dalam isyarat pertuturan. Prestasi kaedah yang dicadangkan dinilai pada data dalaman dan pada CENSREC-1-C, pangkalan data tersedia secara umum yang digunakan untuk VAD dalam konteks pengecaman pertuturan automatik (ASR). Pada kedua-dua set ujian, kaedah yang dicadangkan mengatasi algoritma berasaskan tenaga ringkas dan ditunjukkan sebagai lebih teguh terhadap perubahan keterlanjuran pertuturan, kebolehubahan SNR dan jenis hingar.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
David COURNAPEAU, Tatsuya KAWAHARA, "Voice Activity Detection Based on High Order Statistics and Online EM Algorithm" in IEICE TRANSACTIONS on Information,
vol. E91-D, no. 12, pp. 2854-2861, December 2008, doi: 10.1093/ietisy/e91-d.12.2854.
Abstract: A new online, unsupervised voice activity detection (VAD) method is proposed. The method is based on a feature derived from high-order statistics (HOS), enhanced by a second metric based on normalized autocorrelation peaks to improve its robustness to non-Gaussian noises. This feature is also oriented for discriminating between close-talk and far-field speech, thus providing a VAD method in the context of human-to-human interaction independent of the energy level. The classification is done by an online variation of the Expectation-Maximization (EM) algorithm, to track and adapt to noise variations in the speech signal. Performance of the proposed method is evaluated on an in-house data and on CENSREC-1-C, a publicly available database used for VAD in the context of automatic speech recognition (ASR). On both test sets, the proposed method outperforms a simple energy-based algorithm and is shown to be more robust against the change in speech sparsity, SNR variability and the noise type.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.12.2854/_p
Salinan
@ARTICLE{e91-d_12_2854,
author={David COURNAPEAU, Tatsuya KAWAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Voice Activity Detection Based on High Order Statistics and Online EM Algorithm},
year={2008},
volume={E91-D},
number={12},
pages={2854-2861},
abstract={A new online, unsupervised voice activity detection (VAD) method is proposed. The method is based on a feature derived from high-order statistics (HOS), enhanced by a second metric based on normalized autocorrelation peaks to improve its robustness to non-Gaussian noises. This feature is also oriented for discriminating between close-talk and far-field speech, thus providing a VAD method in the context of human-to-human interaction independent of the energy level. The classification is done by an online variation of the Expectation-Maximization (EM) algorithm, to track and adapt to noise variations in the speech signal. Performance of the proposed method is evaluated on an in-house data and on CENSREC-1-C, a publicly available database used for VAD in the context of automatic speech recognition (ASR). On both test sets, the proposed method outperforms a simple energy-based algorithm and is shown to be more robust against the change in speech sparsity, SNR variability and the noise type.},
keywords={},
doi={10.1093/ietisy/e91-d.12.2854},
ISSN={1745-1361},
month={December},}
Salinan
TY - JOUR
TI - Voice Activity Detection Based on High Order Statistics and Online EM Algorithm
T2 - IEICE TRANSACTIONS on Information
SP - 2854
EP - 2861
AU - David COURNAPEAU
AU - Tatsuya KAWAHARA
PY - 2008
DO - 10.1093/ietisy/e91-d.12.2854
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2008
AB - A new online, unsupervised voice activity detection (VAD) method is proposed. The method is based on a feature derived from high-order statistics (HOS), enhanced by a second metric based on normalized autocorrelation peaks to improve its robustness to non-Gaussian noises. This feature is also oriented for discriminating between close-talk and far-field speech, thus providing a VAD method in the context of human-to-human interaction independent of the energy level. The classification is done by an online variation of the Expectation-Maximization (EM) algorithm, to track and adapt to noise variations in the speech signal. Performance of the proposed method is evaluated on an in-house data and on CENSREC-1-C, a publicly available database used for VAD in the context of automatic speech recognition (ASR). On both test sets, the proposed method outperforms a simple energy-based algorithm and is shown to be more robust against the change in speech sparsity, SNR variability and the noise type.
ER -