The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Rangkaian Deep Q yang biasa digunakan diketahui melebihkan nilai tindakan dalam keadaan tertentu. Ia juga membuktikan bahawa anggaran yang terlalu tinggi mendatangkan kemudaratan kepada prestasi, yang mungkin menyebabkan ketidakstabilan dan perbezaan pembelajaran. Dalam kertas kerja ini, kami membentangkan algoritma Deep Sarsa dan Q Networks (DSQN), yang boleh dianggap sebagai penambahbaikan kepada algoritma Deep Q Networks. Pertama, algoritma DSQN mengambil kesempatan daripada pengalaman main semula dan teknik rangkaian sasaran dalam Deep Q Networks untuk meningkatkan kestabilan rangkaian saraf. Kedua, penganggar berganda digunakan untuk pembelajaran-Q untuk mengurangkan anggaran berlebihan. Terutamanya, kami memperkenalkan pembelajaran Sarsa kepada Deep Q Networks untuk menghapuskan anggaran yang berlebihan lagi. Akhir sekali, algoritma DSQN dinilai pada pengimbangan tiang kereta, kereta gunung dan tugas kawalan lunarlander dari Gim OpenAI. Keputusan penilaian empirikal menunjukkan bahawa kaedah yang dicadangkan membawa kepada pengurangan anggaran berlebihan, proses pembelajaran yang lebih stabil dan prestasi yang lebih baik.
Zhi-xiong XU
PLA University of Science and Technology
Lei CAO
PLA University of Science and Technology
Xi-liang CHEN
PLA University of Science and Technology
Chen-xi LI
PLA University of Science and Technology
Yong-liang ZHANG
PLA University of Science and Technology
Jun LAI
PLA University of Science and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Zhi-xiong XU, Lei CAO, Xi-liang CHEN, Chen-xi LI, Yong-liang ZHANG, Jun LAI, "Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach" in IEICE TRANSACTIONS on Information,
vol. E101-D, no. 9, pp. 2315-2322, September 2018, doi: 10.1587/transinf.2017EDP7278.
Abstract: The commonly used Deep Q Networks is known to overestimate action values under certain conditions. It's also proved that overestimations do harm to performance, which might cause instability and divergence of learning. In this paper, we present the Deep Sarsa and Q Networks (DSQN) algorithm, which can considered as an enhancement to the Deep Q Networks algorithm. First, DSQN algorithm takes advantage of the experience replay and target network techniques in Deep Q Networks to improve the stability of neural networks. Second, double estimator is utilized for Q-learning to reduce overestimations. Especially, we introduce Sarsa learning to Deep Q Networks for removing overestimations further. Finally, DSQN algorithm is evaluated on cart-pole balancing, mountain car and lunarlander control task from the OpenAI Gym. The empirical evaluation results show that the proposed method leads to reduced overestimations, more stable learning process and improved performance.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDP7278/_p
Salinan
@ARTICLE{e101-d_9_2315,
author={Zhi-xiong XU, Lei CAO, Xi-liang CHEN, Chen-xi LI, Yong-liang ZHANG, Jun LAI, },
journal={IEICE TRANSACTIONS on Information},
title={Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach},
year={2018},
volume={E101-D},
number={9},
pages={2315-2322},
abstract={The commonly used Deep Q Networks is known to overestimate action values under certain conditions. It's also proved that overestimations do harm to performance, which might cause instability and divergence of learning. In this paper, we present the Deep Sarsa and Q Networks (DSQN) algorithm, which can considered as an enhancement to the Deep Q Networks algorithm. First, DSQN algorithm takes advantage of the experience replay and target network techniques in Deep Q Networks to improve the stability of neural networks. Second, double estimator is utilized for Q-learning to reduce overestimations. Especially, we introduce Sarsa learning to Deep Q Networks for removing overestimations further. Finally, DSQN algorithm is evaluated on cart-pole balancing, mountain car and lunarlander control task from the OpenAI Gym. The empirical evaluation results show that the proposed method leads to reduced overestimations, more stable learning process and improved performance.},
keywords={},
doi={10.1587/transinf.2017EDP7278},
ISSN={1745-1361},
month={September},}
Salinan
TY - JOUR
TI - Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach
T2 - IEICE TRANSACTIONS on Information
SP - 2315
EP - 2322
AU - Zhi-xiong XU
AU - Lei CAO
AU - Xi-liang CHEN
AU - Chen-xi LI
AU - Yong-liang ZHANG
AU - Jun LAI
PY - 2018
DO - 10.1587/transinf.2017EDP7278
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2018
AB - The commonly used Deep Q Networks is known to overestimate action values under certain conditions. It's also proved that overestimations do harm to performance, which might cause instability and divergence of learning. In this paper, we present the Deep Sarsa and Q Networks (DSQN) algorithm, which can considered as an enhancement to the Deep Q Networks algorithm. First, DSQN algorithm takes advantage of the experience replay and target network techniques in Deep Q Networks to improve the stability of neural networks. Second, double estimator is utilized for Q-learning to reduce overestimations. Especially, we introduce Sarsa learning to Deep Q Networks for removing overestimations further. Finally, DSQN algorithm is evaluated on cart-pole balancing, mountain car and lunarlander control task from the OpenAI Gym. The empirical evaluation results show that the proposed method leads to reduced overestimations, more stable learning process and improved performance.
ER -