Exploring modulated detection transformer as a tool for action recognition in videos

During recent years transformers architectures have been growing in popularity. Modulated Detection Transformer (MDETR) is an end-to-endmulti-modal understanding model that performs tasks such as phase grounding, referring expression comprehension, referring expression segmentation, andvisual questi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Crisol, Tomás, Ermantraut, Joel, Rostagno, Adrián, Aggio, Santiago L., Iparraguirre, Javier
Formato:	Objeto de conferencia
Lenguaje:	Inglés
Publicado:	2022
Materias:	Ciencias Informáticas Multi-modal transformers Action detection Model generalization
Acceso en línea:	http://sedici.unlp.edu.ar/handle/10915/151735 https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/388/326
Aporte de:	SEDICI (UNLP) de Universidad Nacional de La Plata

id	I19-R120-10915-151735
record_format	dspace
spelling	I19-R120-10915-1517352023-04-19T20:05:33Z http://sedici.unlp.edu.ar/handle/10915/151735 https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/388/326 issn:2451-7496 Exploring modulated detection transformer as a tool for action recognition in videos Crisol, Tomás Ermantraut, Joel Rostagno, Adrián Aggio, Santiago L. Iparraguirre, Javier 2022-10 2022 2023-04-19T14:57:38Z en Ciencias Informáticas Multi-modal transformers Action detection Model generalization During recent years transformers architectures have been growing in popularity. Modulated Detection Transformer (MDETR) is an end-to-endmulti-modal understanding model that performs tasks such as phase grounding, referring expression comprehension, referring expression segmentation, andvisual question answering. One remarkable aspect of the model is the capacity to infer over classes that it was not previously trained for. In this work we explore the use of MDETR in a new task, action detection, without any previous training. We obtain quantitative results using the Atomic Visual Actions dataset.Although the model does not report the best performance in the task, we believe that it is an interesting finding. We show that it is possible to use a multi-modal model to tackle a task that it was not designed for. Finally, we believe that this line of research may lead into the generalization of MDETR in additionaldownstream tasks. Sociedad Argentina de Informática e Investigación Operativa Objeto de conferencia Objeto de conferencia http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf 6-10
institution	Universidad Nacional de La Plata
institution_str	I-19
repository_str	R-120
collection	SEDICI (UNLP)
language	Inglés
topic	Ciencias Informáticas Multi-modal transformers Action detection Model generalization
spellingShingle	Ciencias Informáticas Multi-modal transformers Action detection Model generalization Crisol, Tomás Ermantraut, Joel Rostagno, Adrián Aggio, Santiago L. Iparraguirre, Javier Exploring modulated detection transformer as a tool for action recognition in videos
topic_facet	Ciencias Informáticas Multi-modal transformers Action detection Model generalization
description	During recent years transformers architectures have been growing in popularity. Modulated Detection Transformer (MDETR) is an end-to-endmulti-modal understanding model that performs tasks such as phase grounding, referring expression comprehension, referring expression segmentation, andvisual question answering. One remarkable aspect of the model is the capacity to infer over classes that it was not previously trained for. In this work we explore the use of MDETR in a new task, action detection, without any previous training. We obtain quantitative results using the Atomic Visual Actions dataset.Although the model does not report the best performance in the task, we believe that it is an interesting finding. We show that it is possible to use a multi-modal model to tackle a task that it was not designed for. Finally, we believe that this line of research may lead into the generalization of MDETR in additionaldownstream tasks.
format	Objeto de conferencia Objeto de conferencia
author	Crisol, Tomás Ermantraut, Joel Rostagno, Adrián Aggio, Santiago L. Iparraguirre, Javier
author_facet	Crisol, Tomás Ermantraut, Joel Rostagno, Adrián Aggio, Santiago L. Iparraguirre, Javier
author_sort	Crisol, Tomás
title	Exploring modulated detection transformer as a tool for action recognition in videos
title_short	Exploring modulated detection transformer as a tool for action recognition in videos
title_full	Exploring modulated detection transformer as a tool for action recognition in videos
title_fullStr	Exploring modulated detection transformer as a tool for action recognition in videos
title_full_unstemmed	Exploring modulated detection transformer as a tool for action recognition in videos
title_sort	exploring modulated detection transformer as a tool for action recognition in videos
publishDate	2022
url	http://sedici.unlp.edu.ar/handle/10915/151735 https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/388/326
work_keys_str_mv	AT crisoltomas exploringmodulateddetectiontransformerasatoolforactionrecognitioninvideos AT ermantrautjoel exploringmodulateddetectiontransformerasatoolforactionrecognitioninvideos AT rostagnoadrian exploringmodulateddetectiontransformerasatoolforactionrecognitioninvideos AT aggiosantiagol exploringmodulateddetectiontransformerasatoolforactionrecognitioninvideos AT iparraguirrejavier exploringmodulateddetectiontransformerasatoolforactionrecognitioninvideos
_version_	1765660010836131840

Exploring modulated detection transformer as a tool for action recognition in videos

Ejemplares similares