The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Sebutan serentak memberi kesan kepada keupayaan kedua-dua orang cacat pendengaran dan sistem pengecaman pertuturan automatik. Baru-baru ini, rangkaian saraf dalam telah meningkatkan prestasi pemisahan pertuturan secara mendadak. Walau bagaimanapun, kebanyakan karya terdahulu hanya menganggarkan magnitud pertuturan dan menggunakan fasa campuran untuk pembinaan semula pertuturan. Penggunaan fasa campuran telah menjadi had kritikal untuk prestasi pemisahan. Kajian ini mencadangkan pendekatan sedar fasa dua peringkat untuk pemisahan pertuturan berbilang penutur, yang secara bersepadu memulihkan magnitud serta fasa. Untuk pemulihan fasa, algoritma Penyongsangan Spektrogram Berbilang Input (MISI) digunakan kerana keberkesanan dan kesederhanaannya. Kajian itu melaksanakan algoritma MISI berdasarkan topeng dan memberikan topeng amplitud ideal (IAM) adalah topeng optimum untuk pemulihan fasa MISI berasaskan topeng, yang membawa kurang herotan fasa. Untuk mengimbangi ralat pemulihan fasa dan meminimumkan herotan isyarat, topeng lanjutan dicadangkan untuk anggaran magnitud. IAM dan topeng yang dicadangkan dianggarkan pada peringkat yang berbeza untuk memulihkan fasa dan magnitud, masing-masing. Dua rangka kerja rangkaian saraf dinilai untuk anggaran magnitud pada peringkat kedua, menunjukkan keberkesanan dan fleksibiliti pendekatan yang dicadangkan. Keputusan eksperimen menunjukkan bahawa pendekatan yang dicadangkan meminimumkan herotan pertuturan yang dipisahkan dengan ketara.
Lu YIN
University of Chinese Academy of Sciences,Chinese Academy of Sciences
Junfeng LI
University of Chinese Academy of Sciences,Chinese Academy of Sciences
Yonghong YAN
University of Chinese Academy of Sciences,Chinese Academy of Sciences
Masato AKAGI
Japan Advanced Institute of Science and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Lu YIN, Junfeng LI, Yonghong YAN, Masato AKAGI, "A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 7, pp. 1732-1743, July 2020, doi: 10.1587/transinf.2019EDP7259.
Abstract: The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7259/_p
Salinan
@ARTICLE{e103-d_7_1732,
author={Lu YIN, Junfeng LI, Yonghong YAN, Masato AKAGI, },
journal={IEICE TRANSACTIONS on Information},
title={A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation},
year={2020},
volume={E103-D},
number={7},
pages={1732-1743},
abstract={The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech.},
keywords={},
doi={10.1587/transinf.2019EDP7259},
ISSN={1745-1361},
month={July},}
Salinan
TY - JOUR
TI - A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation
T2 - IEICE TRANSACTIONS on Information
SP - 1732
EP - 1743
AU - Lu YIN
AU - Junfeng LI
AU - Yonghong YAN
AU - Masato AKAGI
PY - 2020
DO - 10.1587/transinf.2019EDP7259
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2020
AB - The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech.
ER -