The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Pengelasan teks ialah tugas asas dalam pemprosesan bahasa semula jadi, yang menemui aplikasi yang meluas dalam pelbagai domain, seperti pengesanan spam dan analisis sentimen. Maklumat sintaksis boleh digunakan dengan berkesan untuk meningkatkan prestasi model rangkaian saraf dalam memahami semantik teks. Teks Cina mempamerkan tahap kerumitan sintaksis yang tinggi, dengan kata-kata individu selalunya mempunyai beberapa bahagian pertuturan. Dalam makalah ini, kami mencadangkan BRsyn-caps, model klasifikasi teks Cina berasaskan rangkaian kapsul yang memanfaatkan kedua-dua Bert dan sintaks pergantungan. Pendekatan yang dicadangkan kami menyepadukan maklumat semantik melalui model pra-latihan Bert untuk mendapatkan perwakilan perkataan, mengekstrak maklumat kontekstual melalui Rangkaian Neural Memori Jangka Pendek Panjang (LSTM), mengodkan pokok pergantungan sintaksis melalui rangkaian saraf perhatian graf, dan menggunakan rangkaian kapsul untuk menyepadukan ciri dengan berkesan untuk klasifikasi teks. Selain itu, kami mencadangkan algoritma pembinaan matriks bersebelahan pokok pergantungan sintaksis peringkat aksara, yang boleh memperkenalkan maklumat sintaksis ke dalam perwakilan peringkat aksara. Percubaan pada lima set data menunjukkan bahawa BRsyn-caps boleh menyepadukan maklumat semantik, urutan dan sintaksis dalam teks dengan berkesan, membuktikan keberkesanan kaedah cadangan kami untuk klasifikasi teks bahasa Cina.
Jie LUO
Wuhan Institute of Technology
Chengwan HE
Wuhan Institute of Technology
Hongwei LUO
Wuhan Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Jie LUO, Chengwan HE, Hongwei LUO, "BRsyn-Caps: Chinese Text Classification Using Capsule Network Based on Bert and Dependency Syntax" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 2, pp. 212-219, February 2024, doi: 10.1587/transinf.2023EDP7119.
Abstract: Text classification is a fundamental task in natural language processing, which finds extensive applications in various domains, such as spam detection and sentiment analysis. Syntactic information can be effectively utilized to improve the performance of neural network models in understanding the semantics of text. The Chinese text exhibits a high degree of syntactic complexity, with individual words often possessing multiple parts of speech. In this paper, we propose BRsyn-caps, a capsule network-based Chinese text classification model that leverages both Bert and dependency syntax. Our proposed approach integrates semantic information through Bert pre-training model for obtaining word representations, extracts contextual information through Long Short-term memory neural network (LSTM), encodes syntactic dependency trees through graph attention neural network, and utilizes capsule network to effectively integrate features for text classification. Additionally, we propose a character-level syntactic dependency tree adjacency matrix construction algorithm, which can introduce syntactic information into character-level representation. Experiments on five datasets demonstrate that BRsyn-caps can effectively integrate semantic, sequential, and syntactic information in text, proving the effectiveness of our proposed method for Chinese text classification.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDP7119/_p
Salinan
@ARTICLE{e107-d_2_212,
author={Jie LUO, Chengwan HE, Hongwei LUO, },
journal={IEICE TRANSACTIONS on Information},
title={BRsyn-Caps: Chinese Text Classification Using Capsule Network Based on Bert and Dependency Syntax},
year={2024},
volume={E107-D},
number={2},
pages={212-219},
abstract={Text classification is a fundamental task in natural language processing, which finds extensive applications in various domains, such as spam detection and sentiment analysis. Syntactic information can be effectively utilized to improve the performance of neural network models in understanding the semantics of text. The Chinese text exhibits a high degree of syntactic complexity, with individual words often possessing multiple parts of speech. In this paper, we propose BRsyn-caps, a capsule network-based Chinese text classification model that leverages both Bert and dependency syntax. Our proposed approach integrates semantic information through Bert pre-training model for obtaining word representations, extracts contextual information through Long Short-term memory neural network (LSTM), encodes syntactic dependency trees through graph attention neural network, and utilizes capsule network to effectively integrate features for text classification. Additionally, we propose a character-level syntactic dependency tree adjacency matrix construction algorithm, which can introduce syntactic information into character-level representation. Experiments on five datasets demonstrate that BRsyn-caps can effectively integrate semantic, sequential, and syntactic information in text, proving the effectiveness of our proposed method for Chinese text classification.},
keywords={},
doi={10.1587/transinf.2023EDP7119},
ISSN={1745-1361},
month={February},}
Salinan
TY - JOUR
TI - BRsyn-Caps: Chinese Text Classification Using Capsule Network Based on Bert and Dependency Syntax
T2 - IEICE TRANSACTIONS on Information
SP - 212
EP - 219
AU - Jie LUO
AU - Chengwan HE
AU - Hongwei LUO
PY - 2024
DO - 10.1587/transinf.2023EDP7119
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2024
AB - Text classification is a fundamental task in natural language processing, which finds extensive applications in various domains, such as spam detection and sentiment analysis. Syntactic information can be effectively utilized to improve the performance of neural network models in understanding the semantics of text. The Chinese text exhibits a high degree of syntactic complexity, with individual words often possessing multiple parts of speech. In this paper, we propose BRsyn-caps, a capsule network-based Chinese text classification model that leverages both Bert and dependency syntax. Our proposed approach integrates semantic information through Bert pre-training model for obtaining word representations, extracts contextual information through Long Short-term memory neural network (LSTM), encodes syntactic dependency trees through graph attention neural network, and utilizes capsule network to effectively integrate features for text classification. Additionally, we propose a character-level syntactic dependency tree adjacency matrix construction algorithm, which can introduce syntactic information into character-level representation. Experiments on five datasets demonstrate that BRsyn-caps can effectively integrate semantic, sequential, and syntactic information in text, proving the effectiveness of our proposed method for Chinese text classification.
ER -