The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Dalam artikel ini, kami mencadangkan kaedah yang dipanggil "penopeng hingar berterusan (cNM)" yang membolehkan menghapuskan sisa buzziness dalam vocoder berterusan, iaitu semua parameter adalah berterusan dan menawarkan analisis pertuturan dan sistem sintesis yang mudah dan fleksibel. Vokoder parametrik tradisional secara amnya menunjukkan kemerosotan yang ketara dalam kualiti pertuturan yang disintesis disebabkan oleh algoritma pemprosesan yang berbeza. Tambahan pula, sintesis bunyi yang tidak tepat (cth dalam pernafasan atau serak) juga dianggap sebagai salah satu punca utama kemerosotan prestasi, yang membawa kepada bunyi sementara yang bising dan ketakselanjaran temporal dalam pertuturan yang disintesis. Untuk mengatasi isu ini, cNM baharu dibangunkan berdasarkan sisihan herotan fasa untuk mengurangkan kesan persepsi bunyi sisa, membolehkan pembinaan semula ciri hingar yang betul, dan model segmen suara berderit yang mungkin berlaku dalam pertuturan semula jadi dengan lebih baik. Untuk tujuan ini, cNM direka bentuk untuk memastikan hanya komponen suara dalam keadaan ambang cNM sambil membuang yang lain. Kami menilai pendekatan yang dicadangkan dan membandingkan dengan vocoder terkini menggunakan ujian pendengaran objektif dan subjektif. Keputusan eksperimen menunjukkan bahawa kaedah yang dicadangkan boleh mengurangkan kesan sisa hingar dan boleh mencapai kualiti pendekatan canggih lain seperti STRAIGHT dan model nadi domain log (PML).
Mohammed Salah AL-RADHI
Budapest University of Technology and Economics
Tamás Gábor CSAPÓ
Budapest University of Technology and Economics,MTA-ELTE Lendület Lingual Articulation Research Group
Géza NÉMETH
Budapest University of Technology and Economics
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Mohammed Salah AL-RADHI, Tamás Gábor CSAPÓ, Géza NÉMETH, "Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 5, pp. 1099-1107, May 2020, doi: 10.1587/transinf.2019EDP7167.
Abstract: In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7167/_p
Salinan
@ARTICLE{e103-d_5_1099,
author={Mohammed Salah AL-RADHI, Tamás Gábor CSAPÓ, Géza NÉMETH, },
journal={IEICE TRANSACTIONS on Information},
title={Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis},
year={2020},
volume={E103-D},
number={5},
pages={1099-1107},
abstract={In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).},
keywords={},
doi={10.1587/transinf.2019EDP7167},
ISSN={1745-1361},
month={May},}
Salinan
TY - JOUR
TI - Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis
T2 - IEICE TRANSACTIONS on Information
SP - 1099
EP - 1107
AU - Mohammed Salah AL-RADHI
AU - Tamás Gábor CSAPÓ
AU - Géza NÉMETH
PY - 2020
DO - 10.1587/transinf.2019EDP7167
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2020
AB - In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).
ER -