The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Kami mencadangkan teknik penukaran suara berasaskan segmen menggunakan sintesis pertuturan berasaskan model Markov (HMM) tersembunyi dengan data latihan bukan selari. Dalam teknik yang dicadangkan, maklumat fonem dengan tempoh dan kontur F0 terkuantisasi diekstrak daripada ucapan input penutur sumber, dan dihantar ke bahagian sintesis. Dalam bahagian sintesis, simbol F0 terkuantisasi digunakan sebagai konteks prosodik. Urutan label bergantung konteks secara fonetik dan prosodik dihasilkan daripada fonem yang dihantar dan simbol F0. Kemudian, pertuturan yang ditukar dijana daripada jujukan label dengan tempoh menggunakan HMM bergantung konteks yang telah dilatih oleh pembesar suara sasaran. Dalam latihan model, model penceramah sumber dan sasaran boleh dilatih secara berasingan, oleh itu tidak perlu menyediakan data pertuturan selari penceramah sumber dan sasaran. Keputusan percubaan objektif dan subjektif menunjukkan bahawa penukaran suara berasaskan segmen dengan konteks fonetik dan prosodik berfungsi dengan berkesan walaupun data pertuturan selari tidak tersedia.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Takashi NOSE, Yuhei OTA, Takao KOBAYASHI, "HMM-Based Voice Conversion Using Quantized F0 Context" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 9, pp. 2483-2490, September 2010, doi: 10.1587/transinf.E93.D.2483.
Abstract: We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.2483/_p
Salinan
@ARTICLE{e93-d_9_2483,
author={Takashi NOSE, Yuhei OTA, Takao KOBAYASHI, },
journal={IEICE TRANSACTIONS on Information},
title={HMM-Based Voice Conversion Using Quantized F0 Context},
year={2010},
volume={E93-D},
number={9},
pages={2483-2490},
abstract={We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.},
keywords={},
doi={10.1587/transinf.E93.D.2483},
ISSN={1745-1361},
month={September},}
Salinan
TY - JOUR
TI - HMM-Based Voice Conversion Using Quantized F0 Context
T2 - IEICE TRANSACTIONS on Information
SP - 2483
EP - 2490
AU - Takashi NOSE
AU - Yuhei OTA
AU - Takao KOBAYASHI
PY - 2010
DO - 10.1587/transinf.E93.D.2483
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2010
AB - We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.
ER -