The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Ulasan Danmaku telah menjadi popular untuk tontonan bersama pada platform perkongsian video, seperti Nicovideo. Walau bagaimanapun, banyak komen yang tidak berkaitan biasanya mencemarkan kualiti maklumat yang disediakan oleh video. Masalah pencemar maklumat sedemikian boleh diselesaikan oleh pengelas ulasan yang dilatih dengan pilihan menahan diri, yang mengesan ulasan yang kategori videonya tidak jelas. Untuk meningkatkan prestasi tugas pengelasan ini, kertas kerja ini membentangkan perwakilan bahasa khusus Nicovideo. Khususnya, kami menggunakan ayat daripada Nicopedia, ensiklopedia dalam talian Jepun bagi entiti yang mungkin muncul dalam kandungan Nicovideo, untuk pra-melatih perwakilan pengekod dua arah daripada model Transformers (BERT). Model yang terhasil bernama Nicopedia BERT kemudiannya diperhalusi supaya dapat menentukan sama ada ulasan yang diberikan termasuk dalam mana-mana kategori yang telah ditetapkan. Eksperimen yang dijalankan ke atas data ulasan Nicovideo menunjukkan keberkesanan Nicopedia BERT berbanding model BERT sedia ada yang telah dilatih menggunakan Wikipedia atau tweet. Kami juga menilai prestasi setiap model dalam tugas pengelasan sentimen tambahan, dan keputusan yang diperoleh membayangkan kebolehgunaan Nicopedia BERT sebagai pengekstrak ciri teks media sosial yang lain.
Hiroyoshi NAGAO
Doshisha University
Koshiro TAMURA
Doshisha University
Marie KATSURAI
Doshisha University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Hiroyoshi NAGAO, Koshiro TAMURA, Marie KATSURAI, "Effective Language Representations for Danmaku Comment Classification in Nicovideo" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 5, pp. 838-846, May 2023, doi: 10.1587/transinf.2022DAP0010.
Abstract: Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022DAP0010/_p
Salinan
@ARTICLE{e106-d_5_838,
author={Hiroyoshi NAGAO, Koshiro TAMURA, Marie KATSURAI, },
journal={IEICE TRANSACTIONS on Information},
title={Effective Language Representations for Danmaku Comment Classification in Nicovideo},
year={2023},
volume={E106-D},
number={5},
pages={838-846},
abstract={Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.},
keywords={},
doi={10.1587/transinf.2022DAP0010},
ISSN={1745-1361},
month={May},}
Salinan
TY - JOUR
TI - Effective Language Representations for Danmaku Comment Classification in Nicovideo
T2 - IEICE TRANSACTIONS on Information
SP - 838
EP - 846
AU - Hiroyoshi NAGAO
AU - Koshiro TAMURA
AU - Marie KATSURAI
PY - 2023
DO - 10.1587/transinf.2022DAP0010
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2023
AB - Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.
ER -