Exploring modulated detection transformer as a tool for action recognition in videos
During recent years transformers architectures have been growing in popularity. Modulated Detection Transformer (MDETR) is an end-to-endmulti-modal understanding model that performs tasks such as phase grounding, referring expression comprehension, referring expression segmentation, andvisual questi...
Guardado en:
Autores principales: | , , , , |
---|---|
Formato: | Objeto de conferencia |
Lenguaje: | Inglés |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/151735 https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/388/326 |
Aporte de: |
id |
I19-R120-10915-151735 |
---|---|
record_format |
dspace |
spelling |
I19-R120-10915-1517352023-04-19T20:05:33Z http://sedici.unlp.edu.ar/handle/10915/151735 https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/388/326 issn:2451-7496 Exploring modulated detection transformer as a tool for action recognition in videos Crisol, Tomás Ermantraut, Joel Rostagno, Adrián Aggio, Santiago L. Iparraguirre, Javier 2022-10 2022 2023-04-19T14:57:38Z en Ciencias Informáticas Multi-modal transformers Action detection Model generalization During recent years transformers architectures have been growing in popularity. Modulated Detection Transformer (MDETR) is an end-to-endmulti-modal understanding model that performs tasks such as phase grounding, referring expression comprehension, referring expression segmentation, andvisual question answering. One remarkable aspect of the model is the capacity to infer over classes that it was not previously trained for. In this work we explore the use of MDETR in a new task, action detection, without any previous training. We obtain quantitative results using the Atomic Visual Actions dataset.Although the model does not report the best performance in the task, we believe that it is an interesting finding. We show that it is possible to use a multi-modal model to tackle a task that it was not designed for. Finally, we believe that this line of research may lead into the generalization of MDETR in additionaldownstream tasks. Sociedad Argentina de Informática e Investigación Operativa Objeto de conferencia Objeto de conferencia http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf 6-10 |
institution |
Universidad Nacional de La Plata |
institution_str |
I-19 |
repository_str |
R-120 |
collection |
SEDICI (UNLP) |
language |
Inglés |
topic |
Ciencias Informáticas Multi-modal transformers Action detection Model generalization |
spellingShingle |
Ciencias Informáticas Multi-modal transformers Action detection Model generalization Crisol, Tomás Ermantraut, Joel Rostagno, Adrián Aggio, Santiago L. Iparraguirre, Javier Exploring modulated detection transformer as a tool for action recognition in videos |
topic_facet |
Ciencias Informáticas Multi-modal transformers Action detection Model generalization |
description |
During recent years transformers architectures have been growing in popularity. Modulated Detection Transformer (MDETR) is an end-to-endmulti-modal understanding model that performs tasks such as phase grounding, referring expression comprehension, referring expression segmentation, andvisual question answering. One remarkable aspect of the model is the capacity to infer over classes that it was not previously trained for. In this work we explore the use of MDETR in a new task, action detection, without any previous training. We obtain quantitative results using the Atomic Visual Actions dataset.Although the model does not report the best performance in the task, we believe that it is an interesting finding. We show that it is possible to use a multi-modal model to tackle a task that it was not designed for. Finally, we believe that this line of research may lead into the generalization of MDETR in additionaldownstream tasks. |
format |
Objeto de conferencia Objeto de conferencia |
author |
Crisol, Tomás Ermantraut, Joel Rostagno, Adrián Aggio, Santiago L. Iparraguirre, Javier |
author_facet |
Crisol, Tomás Ermantraut, Joel Rostagno, Adrián Aggio, Santiago L. Iparraguirre, Javier |
author_sort |
Crisol, Tomás |
title |
Exploring modulated detection transformer as a tool for action recognition in videos |
title_short |
Exploring modulated detection transformer as a tool for action recognition in videos |
title_full |
Exploring modulated detection transformer as a tool for action recognition in videos |
title_fullStr |
Exploring modulated detection transformer as a tool for action recognition in videos |
title_full_unstemmed |
Exploring modulated detection transformer as a tool for action recognition in videos |
title_sort |
exploring modulated detection transformer as a tool for action recognition in videos |
publishDate |
2022 |
url |
http://sedici.unlp.edu.ar/handle/10915/151735 https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/388/326 |
work_keys_str_mv |
AT crisoltomas exploringmodulateddetectiontransformerasatoolforactionrecognitioninvideos AT ermantrautjoel exploringmodulateddetectiontransformerasatoolforactionrecognitioninvideos AT rostagnoadrian exploringmodulateddetectiontransformerasatoolforactionrecognitioninvideos AT aggiosantiagol exploringmodulateddetectiontransformerasatoolforactionrecognitioninvideos AT iparraguirrejavier exploringmodulateddetectiontransformerasatoolforactionrecognitioninvideos |
_version_ |
1765660010836131840 |