Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs

The heterogeneous computing paradigm has led to the need for portable and efficient programming solutions that can leverage the capabilities of various hardware devices, such as NVIDIA, Intel, and AMD GPUs. This study evaluates the performance and portability of the SYCL and CUDA languages for a mat...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Costanzo, Manuel, Rucci, Enzo, García-Sánchez, Carlos, Naiouf, Marcelo
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2023
Materias:
GPU
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/155420
Aporte de:
id I19-R120-10915-155420
record_format dspace
spelling I19-R120-10915-1554202023-07-11T20:01:41Z http://sedici.unlp.edu.ar/handle/10915/155420 isbn:978-950-34-2271-7 Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs Costanzo, Manuel Rucci, Enzo García-Sánchez, Carlos Naiouf, Marcelo 2023-06 2023 2023-07-11T17:09:21Z en Ciencias Informáticas oneAPI SYCL GPU CUDA Performance portability The heterogeneous computing paradigm has led to the need for portable and efficient programming solutions that can leverage the capabilities of various hardware devices, such as NVIDIA, Intel, and AMD GPUs. This study evaluates the performance and portability of the SYCL and CUDA languages for a matrix multiplication (MM) application across different GPU architectures. The experimental work showed that, while the CUDA implementation outperforms the SYCL implementation on NVIDIA devices due to optimizations provided by the nvcc compiler, the latter implementation demonstrated remarkable code portability to other GPU architectures, such as AMD and Intel. Furthermore, the architectural efficiency percentages obtained on AMD and Intel GPUs showed consistency with the results observed on NVIDIA devices. Facultad de Informática Objeto de conferencia Objeto de conferencia http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf 13-18
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
oneAPI
SYCL
GPU
CUDA
Performance portability
spellingShingle Ciencias Informáticas
oneAPI
SYCL
GPU
CUDA
Performance portability
Costanzo, Manuel
Rucci, Enzo
García-Sánchez, Carlos
Naiouf, Marcelo
Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs
topic_facet Ciencias Informáticas
oneAPI
SYCL
GPU
CUDA
Performance portability
description The heterogeneous computing paradigm has led to the need for portable and efficient programming solutions that can leverage the capabilities of various hardware devices, such as NVIDIA, Intel, and AMD GPUs. This study evaluates the performance and portability of the SYCL and CUDA languages for a matrix multiplication (MM) application across different GPU architectures. The experimental work showed that, while the CUDA implementation outperforms the SYCL implementation on NVIDIA devices due to optimizations provided by the nvcc compiler, the latter implementation demonstrated remarkable code portability to other GPU architectures, such as AMD and Intel. Furthermore, the architectural efficiency percentages obtained on AMD and Intel GPUs showed consistency with the results observed on NVIDIA devices.
format Objeto de conferencia
Objeto de conferencia
author Costanzo, Manuel
Rucci, Enzo
García-Sánchez, Carlos
Naiouf, Marcelo
author_facet Costanzo, Manuel
Rucci, Enzo
García-Sánchez, Carlos
Naiouf, Marcelo
author_sort Costanzo, Manuel
title Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs
title_short Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs
title_full Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs
title_fullStr Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs
title_full_unstemmed Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs
title_sort brief performance portability analysis of a matrix multiplication kernel on multiple vendor gpus
publishDate 2023
url http://sedici.unlp.edu.ar/handle/10915/155420
work_keys_str_mv AT costanzomanuel briefperformanceportabilityanalysisofamatrixmultiplicationkernelonmultiplevendorgpus
AT ruccienzo briefperformanceportabilityanalysisofamatrixmultiplicationkernelonmultiplevendorgpus
AT garciasanchezcarlos briefperformanceportabilityanalysisofamatrixmultiplicationkernelonmultiplevendorgpus
AT naioufmarcelo briefperformanceportabilityanalysisofamatrixmultiplicationkernelonmultiplevendorgpus
_version_ 1771439083861573632