Assessing the Impact of Contextual Information in Hate Speech Detection

Social networks and other digital media deal with huge amounts of user-generated contents where hate speech has become a problematic more and more relevant. A great effort has been made to develop automatic tools for its analysis and moderation, at least in its most threatening forms, such as in...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Gravano, Agustín, Pérez, Juan Manuel, Luque, Franco M, Zeyat, Demián, Kondratzky, Martín, Moro, Agustín, Serrati, Pablo Santiago, Zajac, Joaquín, Miguel, Paula, Debandi, Natalia, Cotik, Viviana
Formato: Artículo publishedVersion
Lenguaje:Inglés
Publicado: 2023
Materias:
NLP
Acceso en línea:https://repositorio.utdt.edu/handle/20.500.13098/11849
https://doi.org/10.1109/ACCESS.2023.3258973
Aporte de:
id I57-R163-20.500.13098-11849
record_format dspace
spelling I57-R163-20.500.13098-118492023-06-01T07:00:32Z Assessing the Impact of Contextual Information in Hate Speech Detection Gravano, Agustín Pérez, Juan Manuel Luque, Franco M Zeyat, Demián Kondratzky, Martín Moro, Agustín Serrati, Pablo Santiago Zajac, Joaquín Miguel, Paula Debandi, Natalia Cotik, Viviana NLP Text classification Hate speech detection Contextual information Spanish corpus Covid-19 hate speeches Social networks and other digital media deal with huge amounts of user-generated contents where hate speech has become a problematic more and more relevant. A great effort has been made to develop automatic tools for its analysis and moderation, at least in its most threatening forms, such as in violent acts against people and groups protected by law. One limitation of current approaches to automatic hate speech detection is the lack of context. The spotlight on isolated messages, without considering any type of conversational context or even the topic being discussed, severely restricts the available information to determine whether a post on a social network should be tagged as hateful or not. In this work, we assess the impact of adding contextual information to the hate speech detection task.We specifically study a subdomain of Twitter data consisting of replies to digital newspapers posts, which provides a natural environment for contextualized hate speech detection. We built a new corpus in Spanish (Rioplatense variant) focused on hate speech associated to the COVID-19 pandemic, annotated using guidelines carefully designed by our interdisciplinary team. Our classification experiments using state-of-the-art transformer-based machine learning techniques show evidence that adding contextual information improves the performance of hate speech detection for two proposed tasks: binary and multi-label prediction, increasing their Macro F1 by 4.2 and 5.5 points, respectively. These results highlight the importance of using contextual information in hate speech detection. Our code, models, and corpus has been made available for further research. 2023-05-31T18:56:49Z 2023-05-31T18:56:49Z 2023 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion https://repositorio.utdt.edu/handle/20.500.13098/11849 https://doi.org/10.1109/ACCESS.2023.3258973 eng IEEE Access, vol. 11, pp. 30575-30590, 2023, doi: 10.1109/ACCESS.2023.3258973. info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/4.0/ pp. 30575-30590 application/pdf application/pdf
institution Universidad Torcuato Di Tella
institution_str I-57
repository_str R-163
collection Repositorio Digital Universidad Torcuato Di Tella
language Inglés
orig_language_str_mv eng
topic NLP
Text classification
Hate speech detection
Contextual information
Spanish corpus
Covid-19 hate speeches
spellingShingle NLP
Text classification
Hate speech detection
Contextual information
Spanish corpus
Covid-19 hate speeches
Gravano, Agustín
Pérez, Juan Manuel
Luque, Franco M
Zeyat, Demián
Kondratzky, Martín
Moro, Agustín
Serrati, Pablo Santiago
Zajac, Joaquín
Miguel, Paula
Debandi, Natalia
Cotik, Viviana
Assessing the Impact of Contextual Information in Hate Speech Detection
topic_facet NLP
Text classification
Hate speech detection
Contextual information
Spanish corpus
Covid-19 hate speeches
description Social networks and other digital media deal with huge amounts of user-generated contents where hate speech has become a problematic more and more relevant. A great effort has been made to develop automatic tools for its analysis and moderation, at least in its most threatening forms, such as in violent acts against people and groups protected by law. One limitation of current approaches to automatic hate speech detection is the lack of context. The spotlight on isolated messages, without considering any type of conversational context or even the topic being discussed, severely restricts the available information to determine whether a post on a social network should be tagged as hateful or not. In this work, we assess the impact of adding contextual information to the hate speech detection task.We specifically study a subdomain of Twitter data consisting of replies to digital newspapers posts, which provides a natural environment for contextualized hate speech detection. We built a new corpus in Spanish (Rioplatense variant) focused on hate speech associated to the COVID-19 pandemic, annotated using guidelines carefully designed by our interdisciplinary team. Our classification experiments using state-of-the-art transformer-based machine learning techniques show evidence that adding contextual information improves the performance of hate speech detection for two proposed tasks: binary and multi-label prediction, increasing their Macro F1 by 4.2 and 5.5 points, respectively. These results highlight the importance of using contextual information in hate speech detection. Our code, models, and corpus has been made available for further research.
format Artículo
publishedVersion
author Gravano, Agustín
Pérez, Juan Manuel
Luque, Franco M
Zeyat, Demián
Kondratzky, Martín
Moro, Agustín
Serrati, Pablo Santiago
Zajac, Joaquín
Miguel, Paula
Debandi, Natalia
Cotik, Viviana
author_facet Gravano, Agustín
Pérez, Juan Manuel
Luque, Franco M
Zeyat, Demián
Kondratzky, Martín
Moro, Agustín
Serrati, Pablo Santiago
Zajac, Joaquín
Miguel, Paula
Debandi, Natalia
Cotik, Viviana
author_sort Gravano, Agustín
title Assessing the Impact of Contextual Information in Hate Speech Detection
title_short Assessing the Impact of Contextual Information in Hate Speech Detection
title_full Assessing the Impact of Contextual Information in Hate Speech Detection
title_fullStr Assessing the Impact of Contextual Information in Hate Speech Detection
title_full_unstemmed Assessing the Impact of Contextual Information in Hate Speech Detection
title_sort assessing the impact of contextual information in hate speech detection
publishDate 2023
url https://repositorio.utdt.edu/handle/20.500.13098/11849
https://doi.org/10.1109/ACCESS.2023.3258973
work_keys_str_mv AT gravanoagustin assessingtheimpactofcontextualinformationinhatespeechdetection
AT perezjuanmanuel assessingtheimpactofcontextualinformationinhatespeechdetection
AT luquefrancom assessingtheimpactofcontextualinformationinhatespeechdetection
AT zeyatdemian assessingtheimpactofcontextualinformationinhatespeechdetection
AT kondratzkymartin assessingtheimpactofcontextualinformationinhatespeechdetection
AT moroagustin assessingtheimpactofcontextualinformationinhatespeechdetection
AT serratipablosantiago assessingtheimpactofcontextualinformationinhatespeechdetection
AT zajacjoaquin assessingtheimpactofcontextualinformationinhatespeechdetection
AT miguelpaula assessingtheimpactofcontextualinformationinhatespeechdetection
AT debandinatalia assessingtheimpactofcontextualinformationinhatespeechdetection
AT cotikviviana assessingtheimpactofcontextualinformationinhatespeechdetection
_version_ 1768086691085549568