The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Kertas kerja ini mencadangkan kaedah pemisahan sumber separa penyeliaan untuk isyarat muzik stereofonik yang mengandungi berbilang isyarat yang dirakam atau diproses, di mana muzik yang disintesis tertumpu pada muzik stereofonik. Memandangkan isyarat muzik yang disintesis sering dijana sebagai gabungan linear bagi banyak isyarat sumber individu dan keuntungan pencampuran masing-masing, maklumat perbezaan fasa atau fasa antara isyarat antara saluran, yang mewakili ciri spatial persekitaran rakaman, tidak boleh digunakan sebagai petunjuk akustik untuk pemisahan sumber . Pemfaktoran Tensor Bukan Negatif (NTF) ialah teknik berkesan yang boleh digunakan untuk menyelesaikan masalah ini dengan menguraikan spektrogram amplitud isyarat muzik saluran stereo kepada vektor asas dan pengaktifan isyarat sumber muzik individu, bersama-sama dengan keuntungan pencampuran yang sepadan. Walau bagaimanapun, adalah sukar untuk mencapai prestasi pemisahan yang mencukupi menggunakan kaedah ini sahaja, kerana petunjuk akustik yang tersedia untuk pemisahan adalah terhad. Untuk menangani isu ini, kertas kerja ini mencadangkan kaedah Penyelarasan Jarak Cepstral (CDR) untuk pemisahan saluran stereo berasaskan NTF, yang melibatkan membuat sepstrum isyarat sumber yang dipisahkan mengikut Model Campuran Gaussian (GMM) bagi isyarat sumber muzik yang sepadan. GMM ini dilatih terlebih dahulu menggunakan sampel yang tersedia. Penilaian eksperimen yang memisahkan tiga dan empat sumber kukuh dijalankan untuk menyiasat keberkesanan kaedah yang dicadangkan dalam kedua-dua rangka kerja pengasingan diselia dan separa selia, dan prestasi juga dibandingkan dengan kaedah NTF konvensional. Keputusan eksperimen menunjukkan bahawa kaedah yang dicadangkan menghasilkan peningkatan yang ketara dalam kedua-dua rangka kerja pemisahan, dan penyelarasan jarak cepstral memberikan parameter pemisahan yang lebih baik.
Shogo SEKI
Nagoya University
Tomoki TODA
Nagoya University
Kazuya TAKEDA
Nagoya University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Shogo SEKI, Tomoki TODA, Kazuya TAKEDA, "Stereophonic Music Separation Based on Non-Negative Tensor Factorization with Cepstral Distance Regularization" in IEICE TRANSACTIONS on Fundamentals,
vol. E101-A, no. 7, pp. 1057-1064, July 2018, doi: 10.1587/transfun.E101.A.1057.
Abstract: This paper proposes a semi-supervised source separation method for stereophonic music signals containing multiple recorded or processed signals, where synthesized music is focused on the stereophonic music. As the synthesized music signals are often generated as linear combinations of many individual source signals and their respective mixing gains, phase or phase difference information between inter-channel signals, which represent spatial characteristics of recording environments, cannot be utilized as acoustic clues for source separation. Non-negative Tensor Factorization (NTF) is an effective technique which can be used to resolve this problem by decomposing amplitude spectrograms of stereo channel music signals into basis vectors and activations of individual music source signals, along with their corresponding mixing gains. However, it is difficult to achieve sufficient separation performance using this method alone, as the acoustic clues available for separation are limited. To address this issue, this paper proposes a Cepstral Distance Regularization (CDR) method for NTF-based stereo channel separation, which involves making the cepstrum of the separated source signals follow Gaussian Mixture Models (GMMs) of the corresponding the music source signal. These GMMs are trained in advance using available samples. Experimental evaluations separating three and four sound sources are conducted to investigate the effectiveness of the proposed method in both supervised and semi-supervised separation frameworks, and performance is also compared with that of a conventional NTF method. Experimental results demonstrate that the proposed method yields significant improvements within both separation frameworks, and that cepstral distance regularization provides better separation parameters.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E101.A.1057/_p
Salinan
@ARTICLE{e101-a_7_1057,
author={Shogo SEKI, Tomoki TODA, Kazuya TAKEDA, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Stereophonic Music Separation Based on Non-Negative Tensor Factorization with Cepstral Distance Regularization},
year={2018},
volume={E101-A},
number={7},
pages={1057-1064},
abstract={This paper proposes a semi-supervised source separation method for stereophonic music signals containing multiple recorded or processed signals, where synthesized music is focused on the stereophonic music. As the synthesized music signals are often generated as linear combinations of many individual source signals and their respective mixing gains, phase or phase difference information between inter-channel signals, which represent spatial characteristics of recording environments, cannot be utilized as acoustic clues for source separation. Non-negative Tensor Factorization (NTF) is an effective technique which can be used to resolve this problem by decomposing amplitude spectrograms of stereo channel music signals into basis vectors and activations of individual music source signals, along with their corresponding mixing gains. However, it is difficult to achieve sufficient separation performance using this method alone, as the acoustic clues available for separation are limited. To address this issue, this paper proposes a Cepstral Distance Regularization (CDR) method for NTF-based stereo channel separation, which involves making the cepstrum of the separated source signals follow Gaussian Mixture Models (GMMs) of the corresponding the music source signal. These GMMs are trained in advance using available samples. Experimental evaluations separating three and four sound sources are conducted to investigate the effectiveness of the proposed method in both supervised and semi-supervised separation frameworks, and performance is also compared with that of a conventional NTF method. Experimental results demonstrate that the proposed method yields significant improvements within both separation frameworks, and that cepstral distance regularization provides better separation parameters.},
keywords={},
doi={10.1587/transfun.E101.A.1057},
ISSN={1745-1337},
month={July},}
Salinan
TY - JOUR
TI - Stereophonic Music Separation Based on Non-Negative Tensor Factorization with Cepstral Distance Regularization
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1057
EP - 1064
AU - Shogo SEKI
AU - Tomoki TODA
AU - Kazuya TAKEDA
PY - 2018
DO - 10.1587/transfun.E101.A.1057
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E101-A
IS - 7
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - July 2018
AB - This paper proposes a semi-supervised source separation method for stereophonic music signals containing multiple recorded or processed signals, where synthesized music is focused on the stereophonic music. As the synthesized music signals are often generated as linear combinations of many individual source signals and their respective mixing gains, phase or phase difference information between inter-channel signals, which represent spatial characteristics of recording environments, cannot be utilized as acoustic clues for source separation. Non-negative Tensor Factorization (NTF) is an effective technique which can be used to resolve this problem by decomposing amplitude spectrograms of stereo channel music signals into basis vectors and activations of individual music source signals, along with their corresponding mixing gains. However, it is difficult to achieve sufficient separation performance using this method alone, as the acoustic clues available for separation are limited. To address this issue, this paper proposes a Cepstral Distance Regularization (CDR) method for NTF-based stereo channel separation, which involves making the cepstrum of the separated source signals follow Gaussian Mixture Models (GMMs) of the corresponding the music source signal. These GMMs are trained in advance using available samples. Experimental evaluations separating three and four sound sources are conducted to investigate the effectiveness of the proposed method in both supervised and semi-supervised separation frameworks, and performance is also compared with that of a conventional NTF method. Experimental results demonstrate that the proposed method yields significant improvements within both separation frameworks, and that cepstral distance regularization provides better separation parameters.
ER -