The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Dalam dan kami telah membentangkan algoritma berasaskan simulasi untuk mengoptimumkan ganjaran purata dalam Proses Keputusan (SMDP) separa Markov keadaan terhingga berterusan masa berterusan. Kami menganggarkan kecerunan ganjaran purata. Kemudian, algoritma berasaskan simulasi telah dicadangkan untuk menganggarkan kecerunan anggaran ganjaran purata (dipanggil GSMDP), menggunakan hanya satu laluan sampel rantai Markov yang mendasari. GSMDP telah terbukti menumpu dengan kebarangkalian 1. Dalam kertas ini, kami memberikan had pada ralat anggaran dan anggaran untuk algoritma GSMDP. Ralat anggaran anggaran itu ialah saiz perbezaan antara kecerunan sebenar dan kecerunan anggaran. Ralat anggaran, saiz perbezaan antara output algoritma dan output asimptotiknya, timbul kerana algoritma hanya melihat urutan data terhingga.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Ngo Anh VIEN, SeungGwan LEE, TaeChoong CHUNG, "Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 2, pp. 271-279, February 2010, doi: 10.1587/transinf.E93.D.271.
Abstract: In and we have presented a simulation-based algorithm for optimizing the average reward in a parameterized continuous-time, finite-state semi-Markov Decision Process (SMDP). We approximated the gradient of the average reward. Then, a simulation-based algorithm was proposed to estimate the approximate gradient of the average reward (called GSMDP), using only a single sample path of the underlying Markov chain. GSMDP was proved to converge with probability 1. In this paper, we give bounds on the approximation and estimation errors for GSMDP algorithm. The approximation error of that approximation is the size of the difference between the true gradient and the approximate gradient. The estimation error, the size of the difference between the output of the algorithm and its asymptotic output, arises because the algorithm sees only a finite data sequence.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.271/_p
Salinan
@ARTICLE{e93-d_2_271,
author={Ngo Anh VIEN, SeungGwan LEE, TaeChoong CHUNG, },
journal={IEICE TRANSACTIONS on Information},
title={Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors},
year={2010},
volume={E93-D},
number={2},
pages={271-279},
abstract={In and we have presented a simulation-based algorithm for optimizing the average reward in a parameterized continuous-time, finite-state semi-Markov Decision Process (SMDP). We approximated the gradient of the average reward. Then, a simulation-based algorithm was proposed to estimate the approximate gradient of the average reward (called GSMDP), using only a single sample path of the underlying Markov chain. GSMDP was proved to converge with probability 1. In this paper, we give bounds on the approximation and estimation errors for GSMDP algorithm. The approximation error of that approximation is the size of the difference between the true gradient and the approximate gradient. The estimation error, the size of the difference between the output of the algorithm and its asymptotic output, arises because the algorithm sees only a finite data sequence.},
keywords={},
doi={10.1587/transinf.E93.D.271},
ISSN={1745-1361},
month={February},}
Salinan
TY - JOUR
TI - Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors
T2 - IEICE TRANSACTIONS on Information
SP - 271
EP - 279
AU - Ngo Anh VIEN
AU - SeungGwan LEE
AU - TaeChoong CHUNG
PY - 2010
DO - 10.1587/transinf.E93.D.271
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2010
AB - In and we have presented a simulation-based algorithm for optimizing the average reward in a parameterized continuous-time, finite-state semi-Markov Decision Process (SMDP). We approximated the gradient of the average reward. Then, a simulation-based algorithm was proposed to estimate the approximate gradient of the average reward (called GSMDP), using only a single sample path of the underlying Markov chain. GSMDP was proved to converge with probability 1. In this paper, we give bounds on the approximation and estimation errors for GSMDP algorithm. The approximation error of that approximation is the size of the difference between the true gradient and the approximate gradient. The estimation error, the size of the difference between the output of the algorithm and its asymptotic output, arises because the algorithm sees only a finite data sequence.
ER -