The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Seperti kebanyakan pemproses, GPGPU mengalami dinding memori. Penyelesaian tradisional untuk isu ini ialah menggunakan penjadual yang cekap untuk menyembunyikan kependaman capaian memori yang panjang atau menggunakan mekanisme prefetch data untuk mengurangkan kependaman yang disebabkan oleh pemindahan data. Dalam makalah ini, kami mengkaji peringkat pengambilan arahan saluran paip GPU dan menganalisis hubungan antara kapasiti kernel GPU dan kadar ketinggalan arahan. Kami menambah baik mekanisme prefetch baris seterusnya agar sesuai dengan model SIMT GPU dan menentukan parameter optimum mekanisme prefetch pada GPU melalui eksperimen. Keputusan eksperimen menunjukkan bahawa mekanisme prefetch boleh mencapai peningkatan prestasi 12.17% secara purata. Berbanding dengan penyelesaian pembesaran I-Cache, mekanisme prefetch mempunyai kelebihan lebih banyak penerima dan kos yang lebih rendah.
Jianli CAO
Dalian University of Technology
Zhikui CHEN
Dalian University of Technology
Yuxin WANG
Dalian University of Technology
He GUO
Dalian University of Technology
Pengcheng WANG
Jianghuai College of Ahui University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Jianli CAO, Zhikui CHEN, Yuxin WANG, He GUO, Pengcheng WANG, "Instruction Prefetch for Improving GPGPU Performance" in IEICE TRANSACTIONS on Fundamentals,
vol. E104-A, no. 5, pp. 773-785, May 2021, doi: 10.1587/transfun.2020EAP1105.
Abstract: Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2020EAP1105/_p
Salinan
@ARTICLE{e104-a_5_773,
author={Jianli CAO, Zhikui CHEN, Yuxin WANG, He GUO, Pengcheng WANG, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Instruction Prefetch for Improving GPGPU Performance},
year={2021},
volume={E104-A},
number={5},
pages={773-785},
abstract={Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.},
keywords={},
doi={10.1587/transfun.2020EAP1105},
ISSN={1745-1337},
month={May},}
Salinan
TY - JOUR
TI - Instruction Prefetch for Improving GPGPU Performance
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 773
EP - 785
AU - Jianli CAO
AU - Zhikui CHEN
AU - Yuxin WANG
AU - He GUO
AU - Pengcheng WANG
PY - 2021
DO - 10.1587/transfun.2020EAP1105
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E104-A
IS - 5
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - May 2021
AB - Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.
ER -