The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Kemurungan membahayakan keadaan kesihatan orang ramai dan menjejaskan susunan sosial sebagai gangguan mental. Sebagai diagnosis kemurungan yang cekap, pengesanan kemurungan automatik telah menarik minat ramai penyelidik. Kajian ini mempersembahkan model Memori Jangka Pendek Panjang (LSTM) berasaskan perhatian untuk pengesanan kemurungan untuk menggunakan sepenuhnya perbezaan antara kemurungan dan bukan kemurungan antara jangka masa. Model yang dicadangkan menggunakan ciri peringkat bingkai, yang menangkap maklumat temporal pertuturan kemurungan, untuk menggantikan ciri statistik tradisional sebagai input lapisan LSTM. Untuk mencapai lebih banyak perwakilan ciri dalam berbilang dimensi, output LSTM kemudiannya disalurkan pada lapisan perhatian pada kedua-dua dimensi masa dan ciri. Kemudian, kami menggabungkan output lapisan perhatian dan meletakkan perwakilan ciri bercantum ke dalam lapisan bersambung sepenuhnya. Akhirnya, output lapisan yang disambungkan sepenuhnya dihantar ke lapisan softmax. Eksperimen yang dijalankan pada pangkalan data DAIC-WOZ menunjukkan bahawa model LSTM prihatin yang dicadangkan mencapai kadar ketepatan purata 90.2% dan mengatasi prestasi rangkaian LSTM tradisional dan LSTM dengan perhatian tempatan masing-masing sebanyak 0.7% dan 2.3%, yang menunjukkan kebolehlaksanaannya.
Yan ZHAO
Southeast University
Yue XIE
Nanjing Institute of Technology
Ruiyu LIANG
Nanjing Institute of Technology
Li ZHANG
Northumbria University
Li ZHAO
Southeast University
Chengyu LIU
Southeast University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Yan ZHAO, Yue XIE, Ruiyu LIANG, Li ZHANG, Li ZHAO, Chengyu LIU, "Detecting Depression from Speech through an Attentive LSTM Network" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 11, pp. 2019-2023, November 2021, doi: 10.1587/transinf.2020EDL8132.
Abstract: Depression endangers people's health conditions and affects the social order as a mental disorder. As an efficient diagnosis of depression, automatic depression detection has attracted lots of researcher's interest. This study presents an attention-based Long Short-Term Memory (LSTM) model for depression detection to make full use of the difference between depression and non-depression between timeframes. The proposed model uses frame-level features, which capture the temporal information of depressive speech, to replace traditional statistical features as an input of the LSTM layers. To achieve more multi-dimensional deep feature representations, the LSTM output is then passed on attention layers on both time and feature dimensions. Then, we concat the output of the attention layers and put the fused feature representation into the fully connected layer. At last, the fully connected layer's output is passed on to softmax layer. Experiments conducted on the DAIC-WOZ database demonstrate that the proposed attentive LSTM model achieves an average accuracy rate of 90.2% and outperforms the traditional LSTM network and LSTM with local attention by 0.7% and 2.3%, respectively, which indicates its feasibility.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020EDL8132/_p
Salinan
@ARTICLE{e104-d_11_2019,
author={Yan ZHAO, Yue XIE, Ruiyu LIANG, Li ZHANG, Li ZHAO, Chengyu LIU, },
journal={IEICE TRANSACTIONS on Information},
title={Detecting Depression from Speech through an Attentive LSTM Network},
year={2021},
volume={E104-D},
number={11},
pages={2019-2023},
abstract={Depression endangers people's health conditions and affects the social order as a mental disorder. As an efficient diagnosis of depression, automatic depression detection has attracted lots of researcher's interest. This study presents an attention-based Long Short-Term Memory (LSTM) model for depression detection to make full use of the difference between depression and non-depression between timeframes. The proposed model uses frame-level features, which capture the temporal information of depressive speech, to replace traditional statistical features as an input of the LSTM layers. To achieve more multi-dimensional deep feature representations, the LSTM output is then passed on attention layers on both time and feature dimensions. Then, we concat the output of the attention layers and put the fused feature representation into the fully connected layer. At last, the fully connected layer's output is passed on to softmax layer. Experiments conducted on the DAIC-WOZ database demonstrate that the proposed attentive LSTM model achieves an average accuracy rate of 90.2% and outperforms the traditional LSTM network and LSTM with local attention by 0.7% and 2.3%, respectively, which indicates its feasibility.},
keywords={},
doi={10.1587/transinf.2020EDL8132},
ISSN={1745-1361},
month={November},}
Salinan
TY - JOUR
TI - Detecting Depression from Speech through an Attentive LSTM Network
T2 - IEICE TRANSACTIONS on Information
SP - 2019
EP - 2023
AU - Yan ZHAO
AU - Yue XIE
AU - Ruiyu LIANG
AU - Li ZHANG
AU - Li ZHAO
AU - Chengyu LIU
PY - 2021
DO - 10.1587/transinf.2020EDL8132
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2021
AB - Depression endangers people's health conditions and affects the social order as a mental disorder. As an efficient diagnosis of depression, automatic depression detection has attracted lots of researcher's interest. This study presents an attention-based Long Short-Term Memory (LSTM) model for depression detection to make full use of the difference between depression and non-depression between timeframes. The proposed model uses frame-level features, which capture the temporal information of depressive speech, to replace traditional statistical features as an input of the LSTM layers. To achieve more multi-dimensional deep feature representations, the LSTM output is then passed on attention layers on both time and feature dimensions. Then, we concat the output of the attention layers and put the fused feature representation into the fully connected layer. At last, the fully connected layer's output is passed on to softmax layer. Experiments conducted on the DAIC-WOZ database demonstrate that the proposed attentive LSTM model achieves an average accuracy rate of 90.2% and outperforms the traditional LSTM network and LSTM with local attention by 0.7% and 2.3%, respectively, which indicates its feasibility.
ER -