The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Dalam kertas kerja ini, kami mencadangkan model pengekstrakan topik berdasarkan skor perkaitan statistik antara perkataan topik dan perkataan dalam artikel, dan melaporkan hasil yang diperoleh dalam eksperimen pengekstrakan topik menggunakan pengecaman pertuturan berterusan untuk ujaran berita siaran Jepun. Kami cuba mewakili topik ucapan berita menggunakan gabungan pelbagai perkataan topik, yang merupakan perkataan penting dalam artikel berita atau perkataan yang berkaitan dengan berita. Kami menganggap topik berita diwakili oleh gabungan perkataan. Kami secara statistik memodelkan pemetaan daripada perkataan dalam artikel kepada perkataan topik. Menggunakan pemetaan, model pengekstrakan topik boleh mengekstrak perkataan topik walaupun ia tidak muncul dalam artikel. Kami melatih model pengekstrakan topik yang mampu mengira tahap perkaitan antara perkataan topik dan perkataan dalam artikel dengan menggunakan teks akhbar yang meliputi tempoh lima tahun. Tahap perkaitan antara perkataan tersebut dikira berdasarkan ukuran seperti maklumat bersama atau kaedah χ2. Dalam eksperimen yang mengekstrak lima perkataan topik menggunakan model berasaskan χ2, kami mencapai ketepatan 72% dan 12% ingat kembali untuk hasil pengecaman pertuturan. Hasil pengecaman pertuturan biasanya termasuk beberapa ralat pengecaman, yang merendahkan prestasi pengekstrakan topik. Untuk mengelakkan ini, kami menggaji calon N-terbaik dan kemungkinan yang diberikan oleh model akustik dan bahasa. Dalam eksperimen, kami mendapati bahawa mengekstrak lima perkataan topik menggunakan calon N-terbaik dan nilai kemungkinan mencapai ketepatan yang dipertingkatkan dengan ketara.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Katsutoshi OHTSUKI, Tatsuo MATSUOKA, Shoichi MATSUNAGA, Sadaoki FURUI, "Topic Extraction based on Continuous Speech Recognition in Broadcast News Speech" in IEICE TRANSACTIONS on Information,
vol. E85-D, no. 7, pp. 1138-1144, July 2002, doi: .
Abstract: In this paper, we propose topic extraction models based on statistical relevance scores between topic words and words in articles, and report results obtained in topic extraction experiments using continuous speech recognition for Japanese broadcast news utterances. We attempt to represent a topic of news speech using a combination of multiple topic words, which are important words in the news article or words relevant to the news. We assume a topic of news is represented by a combination of words. We statistically model mapping from words in an article to topic words. Using the mapping, the topic extraction model can extract topic words even if they do not appear in the article. We train a topic extraction model capable of computing the degree of relevance between a topic word and a word in an article by using newspaper text covering a five-year period. The degree of relevance between those words is calculated based on measures such as mutual information or the χ2-method. In experiments extracting five topic words using a χ2-based model, we achieve 72% precision and 12% recall for speech recognition results. Speech recognition results generally include a number of recognition errors, which degrades topic extraction performance. To avoid this, we employ N-best candidates and likelihood given by acoustic and language models. In experiments, we find that extracting five topic words using N-best candidate and likelihood values achieves significantly improved precision.
URL: https://global.ieice.org/en_transactions/information/10.1587/e85-d_7_1138/_p
Salinan
@ARTICLE{e85-d_7_1138,
author={Katsutoshi OHTSUKI, Tatsuo MATSUOKA, Shoichi MATSUNAGA, Sadaoki FURUI, },
journal={IEICE TRANSACTIONS on Information},
title={Topic Extraction based on Continuous Speech Recognition in Broadcast News Speech},
year={2002},
volume={E85-D},
number={7},
pages={1138-1144},
abstract={In this paper, we propose topic extraction models based on statistical relevance scores between topic words and words in articles, and report results obtained in topic extraction experiments using continuous speech recognition for Japanese broadcast news utterances. We attempt to represent a topic of news speech using a combination of multiple topic words, which are important words in the news article or words relevant to the news. We assume a topic of news is represented by a combination of words. We statistically model mapping from words in an article to topic words. Using the mapping, the topic extraction model can extract topic words even if they do not appear in the article. We train a topic extraction model capable of computing the degree of relevance between a topic word and a word in an article by using newspaper text covering a five-year period. The degree of relevance between those words is calculated based on measures such as mutual information or the χ2-method. In experiments extracting five topic words using a χ2-based model, we achieve 72% precision and 12% recall for speech recognition results. Speech recognition results generally include a number of recognition errors, which degrades topic extraction performance. To avoid this, we employ N-best candidates and likelihood given by acoustic and language models. In experiments, we find that extracting five topic words using N-best candidate and likelihood values achieves significantly improved precision.},
keywords={},
doi={},
ISSN={},
month={July},}
Salinan
TY - JOUR
TI - Topic Extraction based on Continuous Speech Recognition in Broadcast News Speech
T2 - IEICE TRANSACTIONS on Information
SP - 1138
EP - 1144
AU - Katsutoshi OHTSUKI
AU - Tatsuo MATSUOKA
AU - Shoichi MATSUNAGA
AU - Sadaoki FURUI
PY - 2002
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E85-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2002
AB - In this paper, we propose topic extraction models based on statistical relevance scores between topic words and words in articles, and report results obtained in topic extraction experiments using continuous speech recognition for Japanese broadcast news utterances. We attempt to represent a topic of news speech using a combination of multiple topic words, which are important words in the news article or words relevant to the news. We assume a topic of news is represented by a combination of words. We statistically model mapping from words in an article to topic words. Using the mapping, the topic extraction model can extract topic words even if they do not appear in the article. We train a topic extraction model capable of computing the degree of relevance between a topic word and a word in an article by using newspaper text covering a five-year period. The degree of relevance between those words is calculated based on measures such as mutual information or the χ2-method. In experiments extracting five topic words using a χ2-based model, we achieve 72% precision and 12% recall for speech recognition results. Speech recognition results generally include a number of recognition errors, which degrades topic extraction performance. To avoid this, we employ N-best candidates and likelihood given by acoustic and language models. In experiments, we find that extracting five topic words using N-best candidate and likelihood values achieves significantly improved precision.
ER -