Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs

The heterogeneous computing paradigm has led to the need for portable and efficient programming solutions that can leverage the capabilities of various hardware devices, such as NVIDIA, Intel, and AMD GPUs. This study evaluates the performance and portability of the SYCL and CUDA languages for a mat...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Costanzo, Manuel, Rucci, Enzo, García-Sánchez, Carlos, Naiouf, Marcelo
Formato:	Objeto de conferencia
Lenguaje:	Inglés
Publicado:	2023
Materias:	Ciencias Informáticas oneAPI SYCL GPU CUDA Performance portability
Acceso en línea:	http://sedici.unlp.edu.ar/handle/10915/155420
Aporte de:	SEDICI (UNLP) de Universidad Nacional de La Plata

id	I19-R120-10915-155420
record_format	dspace
spelling	I19-R120-10915-1554202023-07-11T20:01:41Z http://sedici.unlp.edu.ar/handle/10915/155420 isbn:978-950-34-2271-7 Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs Costanzo, Manuel Rucci, Enzo García-Sánchez, Carlos Naiouf, Marcelo 2023-06 2023 2023-07-11T17:09:21Z en Ciencias Informáticas oneAPI SYCL GPU CUDA Performance portability The heterogeneous computing paradigm has led to the need for portable and efficient programming solutions that can leverage the capabilities of various hardware devices, such as NVIDIA, Intel, and AMD GPUs. This study evaluates the performance and portability of the SYCL and CUDA languages for a matrix multiplication (MM) application across different GPU architectures. The experimental work showed that, while the CUDA implementation outperforms the SYCL implementation on NVIDIA devices due to optimizations provided by the nvcc compiler, the latter implementation demonstrated remarkable code portability to other GPU architectures, such as AMD and Intel. Furthermore, the architectural efficiency percentages obtained on AMD and Intel GPUs showed consistency with the results observed on NVIDIA devices. Facultad de Informática Objeto de conferencia Objeto de conferencia http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf 13-18
institution	Universidad Nacional de La Plata
institution_str	I-19
repository_str	R-120
collection	SEDICI (UNLP)
language	Inglés
topic	Ciencias Informáticas oneAPI SYCL GPU CUDA Performance portability
spellingShingle	Ciencias Informáticas oneAPI SYCL GPU CUDA Performance portability Costanzo, Manuel Rucci, Enzo García-Sánchez, Carlos Naiouf, Marcelo Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs
topic_facet	Ciencias Informáticas oneAPI SYCL GPU CUDA Performance portability
description	The heterogeneous computing paradigm has led to the need for portable and efficient programming solutions that can leverage the capabilities of various hardware devices, such as NVIDIA, Intel, and AMD GPUs. This study evaluates the performance and portability of the SYCL and CUDA languages for a matrix multiplication (MM) application across different GPU architectures. The experimental work showed that, while the CUDA implementation outperforms the SYCL implementation on NVIDIA devices due to optimizations provided by the nvcc compiler, the latter implementation demonstrated remarkable code portability to other GPU architectures, such as AMD and Intel. Furthermore, the architectural efficiency percentages obtained on AMD and Intel GPUs showed consistency with the results observed on NVIDIA devices.
format	Objeto de conferencia Objeto de conferencia
author	Costanzo, Manuel Rucci, Enzo García-Sánchez, Carlos Naiouf, Marcelo
author_facet	Costanzo, Manuel Rucci, Enzo García-Sánchez, Carlos Naiouf, Marcelo
author_sort	Costanzo, Manuel
title	Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs
title_short	Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs
title_full	Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs
title_fullStr	Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs
title_full_unstemmed	Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs
title_sort	brief performance portability analysis of a matrix multiplication kernel on multiple vendor gpus
publishDate	2023
url	http://sedici.unlp.edu.ar/handle/10915/155420
work_keys_str_mv	AT costanzomanuel briefperformanceportabilityanalysisofamatrixmultiplicationkernelonmultiplevendorgpus AT ruccienzo briefperformanceportabilityanalysisofamatrixmultiplicationkernelonmultiplevendorgpus AT garciasanchezcarlos briefperformanceportabilityanalysisofamatrixmultiplicationkernelonmultiplevendorgpus AT naioufmarcelo briefperformanceportabilityanalysisofamatrixmultiplicationkernelonmultiplevendorgpus
_version_	1771439083861573632

Brief performance portability analysis of a matrix multiplication kernel on multiple vendor GPUs

Ejemplares similares