The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
GPU telah menjadi unit pengkomputeran yang dominan untuk memenuhi keperluan prestasi tinggi dalam pelbagai bidang pengiraan. Tetapi kependaman operasi yang panjang menyebabkan kurang penggunaan sumber pengkomputeran pada cip, mengakibatkan kemerosotan prestasi apabila menjalankan tugas selari pada GPU. Strategi penjadualan warp yang baik ialah penyelesaian yang berkesan untuk menyembunyikan kependaman dan meningkatkan penggunaan sumber. Walau bagaimanapun, kebanyakan algoritma penjadualan warp semasa pada GPU mengabaikan keupayaan operasi lama untuk menyembunyikan kependaman. Dalam makalah ini, kami mencadangkan algoritma penjadualan warp pertama operasi panjang, LFWS, untuk platform GPU. LFWS menapis meledingkan dalam keadaan sedia kepada baris gilir sedia dan mengemas kini baris gilir dalam masa mengikut perubahan dalam status meledingkan. LFWS membahagikan ledingan dalam baris gilir sedia kepada kumpulan operasi panjang dan pendek berdasarkan jenis operasi dalam penimbal arahan mereka, dan ia memberi keutamaan yang lebih tinggi kepada meledingkan operasi lama dalam baris gilir sedia. Ini boleh menggunakan operasi yang panjang dengan berkesan untuk menyembunyikan beberapa kependaman antara satu sama lain dan meningkatkan keupayaan sistem untuk menyembunyikan kependaman. Untuk mengesahkan keberkesanan LFWS, kami melaksanakan algoritma LFWS pada platform simulasi GPGPU-Sim. Percubaan dijalankan ke atas pelbagai aplikasi CUDA untuk menilai prestasi algoritma LFWS, berbanding dengan lima algoritma penjadualan warp yang lain. Keputusan menunjukkan bahawa algoritma LFWS mencapai peningkatan prestasi purata masing-masing sebanyak 8.01% dan 5.09%, lebih daripada tiga algoritma penjadualan tradisional dan dua novel, meningkatkan penggunaan sumber pengiraan pada GPU dengan berkesan.
Song LIU
Xi'an Jiaotong University
Jie MA
Xi'an Jiaotong University
Chenyu ZHAO
Xi'an Jiaotong University
Xinhe WAN
Xi'an Jiaotong University
Weiguo WU
Xi'an Jiaotong University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Song LIU, Jie MA, Chenyu ZHAO, Xinhe WAN, Weiguo WU, "LFWS: Long-Operation First Warp Scheduling Algorithm to Effectively Hide the Latency for GPUs" in IEICE TRANSACTIONS on Fundamentals,
vol. E106-A, no. 8, pp. 1043-1050, August 2023, doi: 10.1587/transfun.2022EAP1084.
Abstract: GPUs have become the dominant computing units to meet the need of high performance in various computational fields. But the long operation latency causes the underutilization of on-chip computing resources, resulting in performance degradation when running parallel tasks on GPUs. A good warp scheduling strategy is an effective solution to hide latency and improve resource utilization. However, most current warp scheduling algorithms on GPUs ignore the ability of long operations to hide latency. In this paper, we propose a long-operation-first warp scheduling algorithm, LFWS, for GPU platforms. The LFWS filters warps in the ready state to a ready queue and updates the queue in time according to changes in the status of the warp. The LFWS divides the warps in the ready queue into long and short operation groups based on the type of operations in their instruction buffers, and it gives higher priority to the long-operating warp in the ready queue. This can effectively use the long operations to hide some of the latency from each other and enhance the system's ability to hide the latency. To verify the effectiveness of the LFWS, we implement the LFWS algorithm on a simulation platform GPGPU-Sim. Experiments are conducted over various CUDA applications to evaluate the performance of LFWS algorithm, compared with other five warp scheduling algorithms. The results show that the LFWS algorithm achieves an average performance improvement of 8.01% and 5.09%, respectively, over three traditional and two novel warp scheduling algorithms, effectively improving computational resource utilization on GPU.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2022EAP1084/_p
Salinan
@ARTICLE{e106-a_8_1043,
author={Song LIU, Jie MA, Chenyu ZHAO, Xinhe WAN, Weiguo WU, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={LFWS: Long-Operation First Warp Scheduling Algorithm to Effectively Hide the Latency for GPUs},
year={2023},
volume={E106-A},
number={8},
pages={1043-1050},
abstract={GPUs have become the dominant computing units to meet the need of high performance in various computational fields. But the long operation latency causes the underutilization of on-chip computing resources, resulting in performance degradation when running parallel tasks on GPUs. A good warp scheduling strategy is an effective solution to hide latency and improve resource utilization. However, most current warp scheduling algorithms on GPUs ignore the ability of long operations to hide latency. In this paper, we propose a long-operation-first warp scheduling algorithm, LFWS, for GPU platforms. The LFWS filters warps in the ready state to a ready queue and updates the queue in time according to changes in the status of the warp. The LFWS divides the warps in the ready queue into long and short operation groups based on the type of operations in their instruction buffers, and it gives higher priority to the long-operating warp in the ready queue. This can effectively use the long operations to hide some of the latency from each other and enhance the system's ability to hide the latency. To verify the effectiveness of the LFWS, we implement the LFWS algorithm on a simulation platform GPGPU-Sim. Experiments are conducted over various CUDA applications to evaluate the performance of LFWS algorithm, compared with other five warp scheduling algorithms. The results show that the LFWS algorithm achieves an average performance improvement of 8.01% and 5.09%, respectively, over three traditional and two novel warp scheduling algorithms, effectively improving computational resource utilization on GPU.},
keywords={},
doi={10.1587/transfun.2022EAP1084},
ISSN={1745-1337},
month={August},}
Salinan
TY - JOUR
TI - LFWS: Long-Operation First Warp Scheduling Algorithm to Effectively Hide the Latency for GPUs
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1043
EP - 1050
AU - Song LIU
AU - Jie MA
AU - Chenyu ZHAO
AU - Xinhe WAN
AU - Weiguo WU
PY - 2023
DO - 10.1587/transfun.2022EAP1084
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E106-A
IS - 8
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - August 2023
AB - GPUs have become the dominant computing units to meet the need of high performance in various computational fields. But the long operation latency causes the underutilization of on-chip computing resources, resulting in performance degradation when running parallel tasks on GPUs. A good warp scheduling strategy is an effective solution to hide latency and improve resource utilization. However, most current warp scheduling algorithms on GPUs ignore the ability of long operations to hide latency. In this paper, we propose a long-operation-first warp scheduling algorithm, LFWS, for GPU platforms. The LFWS filters warps in the ready state to a ready queue and updates the queue in time according to changes in the status of the warp. The LFWS divides the warps in the ready queue into long and short operation groups based on the type of operations in their instruction buffers, and it gives higher priority to the long-operating warp in the ready queue. This can effectively use the long operations to hide some of the latency from each other and enhance the system's ability to hide the latency. To verify the effectiveness of the LFWS, we implement the LFWS algorithm on a simulation platform GPGPU-Sim. Experiments are conducted over various CUDA applications to evaluate the performance of LFWS algorithm, compared with other five warp scheduling algorithms. The results show that the LFWS algorithm achieves an average performance improvement of 8.01% and 5.09%, respectively, over three traditional and two novel warp scheduling algorithms, effectively improving computational resource utilization on GPU.
ER -