The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Matlamat Klasifikasi Adegan Akustik (ASC) adalah untuk mensimulasikan analisis manusia terhadap persekitaran sekeliling dan membuat keputusan yang tepat dengan segera. Mengekstrak maklumat berguna daripada isyarat audio dalam senario dunia sebenar adalah mencabar dan boleh membawa kepada prestasi suboptimum dalam klasifikasi pemandangan akustik, terutamanya dalam persekitaran dengan latar belakang yang agak homogen. Untuk menangani masalah ini, kami memodelkan proses menyedarkan "pemabuk" dalam kehidupan sebenar dan tingkah laku membimbing orang normal, dan membina metodologi pelaksanaan model ringan berketepatan tinggi yang dipanggil "metodologi pemabuk". Idea teras merangkumi tiga bahagian: (1) mereka bentuk modul transformasi ciri khas berdasarkan mekanisme persepsi maklumat yang berbeza antara pemabuk dan orang biasa, untuk mensimulasikan proses menyedarkan secara beransur-ansur dan perubahan dalam keupayaan persepsi ciri; (2) mengkaji model "mabuk" ringan yang sepadan dengan proses pemprosesan persepsi model biasa. Model ini menggunakan struktur blok sisa kelas berbilang skala dan boleh mendapatkan perwakilan ciri yang lebih halus dengan menggabungkan maklumat yang diekstrak pada skala yang berbeza; (3) memperkenalkan modul panduan dan gabungan model konvensional kepada model "mabuk" untuk mempercepatkan proses kesedaran dan mencapai pengoptimuman berulang dan peningkatan ketepatan. Keputusan penilaian pada set data rasmi DCASE2022 Task1 menunjukkan bahawa sistem garis dasar kami mencapai ketepatan 40.4% dan kehilangan 2.284 di bawah keadaan parameter 442.67K dan MAC 19.40M (operasi terkumpul berganda). Selepas menggunakan mekanisme "pemabuk", ketepatan dipertingkatkan kepada 45.2%, dan kerugian dikurangkan sebanyak 0.634 di bawah keadaan parameter 551.89K dan 23.6M MAC.
Wenkai LIU
North China University of Technology
Lin ZHANG
North China University of Technology
Menglong WU
North China University of Technology
Xichang CAI
North China University of Technology
Hongxia DONG
North China University of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Wenkai LIU, Lin ZHANG, Menglong WU, Xichang CAI, Hongxia DONG, "Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 1, pp. 83-92, January 2024, doi: 10.1587/transinf.2023EDP7107.
Abstract: The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDP7107/_p
Salinan
@ARTICLE{e107-d_1_83,
author={Wenkai LIU, Lin ZHANG, Menglong WU, Xichang CAI, Hongxia DONG, },
journal={IEICE TRANSACTIONS on Information},
title={Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology},
year={2024},
volume={E107-D},
number={1},
pages={83-92},
abstract={The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.},
keywords={},
doi={10.1587/transinf.2023EDP7107},
ISSN={1745-1361},
month={January},}
Salinan
TY - JOUR
TI - Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology
T2 - IEICE TRANSACTIONS on Information
SP - 83
EP - 92
AU - Wenkai LIU
AU - Lin ZHANG
AU - Menglong WU
AU - Xichang CAI
AU - Hongxia DONG
PY - 2024
DO - 10.1587/transinf.2023EDP7107
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2024
AB - The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
ER -