The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Penggunaan memori tempatan daripada sistem terbenam masa nyata kepada sistem prestasi tinggi dengan pemproses berbilang teras telah menjadi faktor penting untuk memenuhi kekangan tarikh akhir yang sukar. Walau bagaimanapun, cabaran terletak pada bidang mengurus hierarki memori dengan cekap, seperti menguraikan data besar kepada blok kecil untuk dimuatkan ke memori tempatan dan memindahkan blok untuk digunakan semula dan diganti. Untuk menangani isu ini, kertas kerja ini membentangkan kaedah pengoptimuman pengkompil yang mengurus memori tempatan pemproses berbilang teras secara automatik. Kaedah memilih dan memetakan data berbilang dimensi ke blok memori yang ditentukan oleh perisian yang dipanggil Blok Boleh Laras. Blok ini boleh dibahagikan secara hierarki dengan pelbagai saiz yang ditakrifkan oleh ciri aplikasi input. Selain itu, kaedah ini memperkenalkan struktur pemetaan yang dipanggil Tatasusunan Templat untuk mengekalkan indeks data berbilang dimensi terurai. Kerja yang dicadangkan dilaksanakan pada pengkompil selari automatik OSCAR dan penilaian dilakukan pada pemproses 2-teras Renesas RP8. Hasil percubaan daripada Penanda Aras Selari NAS, penanda aras SPEC dan aplikasi multimedia menunjukkan keberkesanan kaedah, memperoleh kelajuan maksimum 20.44 dengan 8 teras menggunakan memori tempatan daripada versi urutan teras tunggal yang menggunakan memori luar cip.
Yoshitake OKI
Waseda University
Yuto ABE
Waseda University
Kazuki YAMAMOTO
Waseda University
Kohei YAMAMOTO
Waseda University
Tomoya SHIRAKAWA
Waseda University
Akimasa YOSHIDA
Meiji University
Keiji KIMURA
Waseda University
Hironori KASAHARA
Waseda University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Yoshitake OKI, Yuto ABE, Kazuki YAMAMOTO, Kohei YAMAMOTO, Tomoya SHIRAKAWA, Akimasa YOSHIDA, Keiji KIMURA, Hironori KASAHARA, "Local Memory Mapping of Multicore Processors on an Automatic Parallelizing Compiler" in IEICE TRANSACTIONS on Electronics,
vol. E103-C, no. 3, pp. 98-109, March 2020, doi: 10.1587/transele.2019LHP0010.
Abstract: Utilization of local memory from real-time embedded systems to high performance systems with multi-core processors has become an important factor for satisfying hard deadline constraints. However, challenges lie in the area of efficiently managing the memory hierarchy, such as decomposing large data into small blocks to fit onto local memory and transferring blocks for reuse and replacement. To address this issue, this paper presents a compiler optimization method that automatically manage local memory of multi-core processors. The method selects and maps multi-dimensional data onto software specified memory blocks called Adjustable Blocks. These blocks are hierarchically divisible with varying sizes defined by the features of the input application. Moreover, the method introduces mapping structures called Template Arrays to maintain the indices of the decomposed multi-dimensional data. The proposed work is implemented on the OSCAR automatic parallelizing compiler and evaluations were performed on the Renesas RP2 8-core processor. Experimental results from NAS Parallel Benchmark, SPEC benchmark, and multimedia applications show the effectiveness of the method, obtaining maximum speed-ups of 20.44 with 8 cores utilizing local memory from single core sequential versions that use off-chip memory.
URL: https://global.ieice.org/en_transactions/electronics/10.1587/transele.2019LHP0010/_p
Salinan
@ARTICLE{e103-c_3_98,
author={Yoshitake OKI, Yuto ABE, Kazuki YAMAMOTO, Kohei YAMAMOTO, Tomoya SHIRAKAWA, Akimasa YOSHIDA, Keiji KIMURA, Hironori KASAHARA, },
journal={IEICE TRANSACTIONS on Electronics},
title={Local Memory Mapping of Multicore Processors on an Automatic Parallelizing Compiler},
year={2020},
volume={E103-C},
number={3},
pages={98-109},
abstract={Utilization of local memory from real-time embedded systems to high performance systems with multi-core processors has become an important factor for satisfying hard deadline constraints. However, challenges lie in the area of efficiently managing the memory hierarchy, such as decomposing large data into small blocks to fit onto local memory and transferring blocks for reuse and replacement. To address this issue, this paper presents a compiler optimization method that automatically manage local memory of multi-core processors. The method selects and maps multi-dimensional data onto software specified memory blocks called Adjustable Blocks. These blocks are hierarchically divisible with varying sizes defined by the features of the input application. Moreover, the method introduces mapping structures called Template Arrays to maintain the indices of the decomposed multi-dimensional data. The proposed work is implemented on the OSCAR automatic parallelizing compiler and evaluations were performed on the Renesas RP2 8-core processor. Experimental results from NAS Parallel Benchmark, SPEC benchmark, and multimedia applications show the effectiveness of the method, obtaining maximum speed-ups of 20.44 with 8 cores utilizing local memory from single core sequential versions that use off-chip memory.},
keywords={},
doi={10.1587/transele.2019LHP0010},
ISSN={1745-1353},
month={March},}
Salinan
TY - JOUR
TI - Local Memory Mapping of Multicore Processors on an Automatic Parallelizing Compiler
T2 - IEICE TRANSACTIONS on Electronics
SP - 98
EP - 109
AU - Yoshitake OKI
AU - Yuto ABE
AU - Kazuki YAMAMOTO
AU - Kohei YAMAMOTO
AU - Tomoya SHIRAKAWA
AU - Akimasa YOSHIDA
AU - Keiji KIMURA
AU - Hironori KASAHARA
PY - 2020
DO - 10.1587/transele.2019LHP0010
JO - IEICE TRANSACTIONS on Electronics
SN - 1745-1353
VL - E103-C
IS - 3
JA - IEICE TRANSACTIONS on Electronics
Y1 - March 2020
AB - Utilization of local memory from real-time embedded systems to high performance systems with multi-core processors has become an important factor for satisfying hard deadline constraints. However, challenges lie in the area of efficiently managing the memory hierarchy, such as decomposing large data into small blocks to fit onto local memory and transferring blocks for reuse and replacement. To address this issue, this paper presents a compiler optimization method that automatically manage local memory of multi-core processors. The method selects and maps multi-dimensional data onto software specified memory blocks called Adjustable Blocks. These blocks are hierarchically divisible with varying sizes defined by the features of the input application. Moreover, the method introduces mapping structures called Template Arrays to maintain the indices of the decomposed multi-dimensional data. The proposed work is implemented on the OSCAR automatic parallelizing compiler and evaluations were performed on the Renesas RP2 8-core processor. Experimental results from NAS Parallel Benchmark, SPEC benchmark, and multimedia applications show the effectiveness of the method, obtaining maximum speed-ups of 20.44 with 8 cores utilizing local memory from single core sequential versions that use off-chip memory.
ER -