Assessing the Impact of Contextual Information in Hate Speech Detection
Social networks and other digital media deal with huge amounts of user-generated contents where hate speech has become a problematic more and more relevant. A great effort has been made to develop automatic tools for its analysis and moderation, at least in its most threatening forms, such as in...
Guardado en:
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Artículo publishedVersion |
Lenguaje: | Inglés |
Publicado: |
2023
|
Materias: | |
Acceso en línea: | https://repositorio.utdt.edu/handle/20.500.13098/11849 https://doi.org/10.1109/ACCESS.2023.3258973 |
Aporte de: |
id |
I57-R163-20.500.13098-11849 |
---|---|
record_format |
dspace |
spelling |
I57-R163-20.500.13098-118492023-06-01T07:00:32Z Assessing the Impact of Contextual Information in Hate Speech Detection Gravano, Agustín Pérez, Juan Manuel Luque, Franco M Zeyat, Demián Kondratzky, Martín Moro, Agustín Serrati, Pablo Santiago Zajac, Joaquín Miguel, Paula Debandi, Natalia Cotik, Viviana NLP Text classification Hate speech detection Contextual information Spanish corpus Covid-19 hate speeches Social networks and other digital media deal with huge amounts of user-generated contents where hate speech has become a problematic more and more relevant. A great effort has been made to develop automatic tools for its analysis and moderation, at least in its most threatening forms, such as in violent acts against people and groups protected by law. One limitation of current approaches to automatic hate speech detection is the lack of context. The spotlight on isolated messages, without considering any type of conversational context or even the topic being discussed, severely restricts the available information to determine whether a post on a social network should be tagged as hateful or not. In this work, we assess the impact of adding contextual information to the hate speech detection task.We specifically study a subdomain of Twitter data consisting of replies to digital newspapers posts, which provides a natural environment for contextualized hate speech detection. We built a new corpus in Spanish (Rioplatense variant) focused on hate speech associated to the COVID-19 pandemic, annotated using guidelines carefully designed by our interdisciplinary team. Our classification experiments using state-of-the-art transformer-based machine learning techniques show evidence that adding contextual information improves the performance of hate speech detection for two proposed tasks: binary and multi-label prediction, increasing their Macro F1 by 4.2 and 5.5 points, respectively. These results highlight the importance of using contextual information in hate speech detection. Our code, models, and corpus has been made available for further research. 2023-05-31T18:56:49Z 2023-05-31T18:56:49Z 2023 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion https://repositorio.utdt.edu/handle/20.500.13098/11849 https://doi.org/10.1109/ACCESS.2023.3258973 eng IEEE Access, vol. 11, pp. 30575-30590, 2023, doi: 10.1109/ACCESS.2023.3258973. info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/4.0/ pp. 30575-30590 application/pdf application/pdf |
institution |
Universidad Torcuato Di Tella |
institution_str |
I-57 |
repository_str |
R-163 |
collection |
Repositorio Digital Universidad Torcuato Di Tella |
language |
Inglés |
orig_language_str_mv |
eng |
topic |
NLP Text classification Hate speech detection Contextual information Spanish corpus Covid-19 hate speeches |
spellingShingle |
NLP Text classification Hate speech detection Contextual information Spanish corpus Covid-19 hate speeches Gravano, Agustín Pérez, Juan Manuel Luque, Franco M Zeyat, Demián Kondratzky, Martín Moro, Agustín Serrati, Pablo Santiago Zajac, Joaquín Miguel, Paula Debandi, Natalia Cotik, Viviana Assessing the Impact of Contextual Information in Hate Speech Detection |
topic_facet |
NLP Text classification Hate speech detection Contextual information Spanish corpus Covid-19 hate speeches |
description |
Social networks and other digital media deal with huge amounts of user-generated contents
where hate speech has become a problematic more and more relevant. A great effort has been made to
develop automatic tools for its analysis and moderation, at least in its most threatening forms, such as in
violent acts against people and groups protected by law. One limitation of current approaches to automatic
hate speech detection is the lack of context. The spotlight on isolated messages, without considering any type
of conversational context or even the topic being discussed, severely restricts the available information to
determine whether a post on a social network should be tagged as hateful or not. In this work, we assess the
impact of adding contextual information to the hate speech detection task.We specifically study a subdomain
of Twitter data consisting of replies to digital newspapers posts, which provides a natural environment
for contextualized hate speech detection. We built a new corpus in Spanish (Rioplatense variant) focused
on hate speech associated to the COVID-19 pandemic, annotated using guidelines carefully designed by
our interdisciplinary team. Our classification experiments using state-of-the-art transformer-based machine
learning techniques show evidence that adding contextual information improves the performance of hate
speech detection for two proposed tasks: binary and multi-label prediction, increasing their Macro F1 by
4.2 and 5.5 points, respectively. These results highlight the importance of using contextual information in
hate speech detection. Our code, models, and corpus has been made available for further research. |
format |
Artículo publishedVersion |
author |
Gravano, Agustín Pérez, Juan Manuel Luque, Franco M Zeyat, Demián Kondratzky, Martín Moro, Agustín Serrati, Pablo Santiago Zajac, Joaquín Miguel, Paula Debandi, Natalia Cotik, Viviana |
author_facet |
Gravano, Agustín Pérez, Juan Manuel Luque, Franco M Zeyat, Demián Kondratzky, Martín Moro, Agustín Serrati, Pablo Santiago Zajac, Joaquín Miguel, Paula Debandi, Natalia Cotik, Viviana |
author_sort |
Gravano, Agustín |
title |
Assessing the Impact of Contextual Information in Hate Speech Detection |
title_short |
Assessing the Impact of Contextual Information in Hate Speech Detection |
title_full |
Assessing the Impact of Contextual Information in Hate Speech Detection |
title_fullStr |
Assessing the Impact of Contextual Information in Hate Speech Detection |
title_full_unstemmed |
Assessing the Impact of Contextual Information in Hate Speech Detection |
title_sort |
assessing the impact of contextual information in hate speech detection |
publishDate |
2023 |
url |
https://repositorio.utdt.edu/handle/20.500.13098/11849 https://doi.org/10.1109/ACCESS.2023.3258973 |
work_keys_str_mv |
AT gravanoagustin assessingtheimpactofcontextualinformationinhatespeechdetection AT perezjuanmanuel assessingtheimpactofcontextualinformationinhatespeechdetection AT luquefrancom assessingtheimpactofcontextualinformationinhatespeechdetection AT zeyatdemian assessingtheimpactofcontextualinformationinhatespeechdetection AT kondratzkymartin assessingtheimpactofcontextualinformationinhatespeechdetection AT moroagustin assessingtheimpactofcontextualinformationinhatespeechdetection AT serratipablosantiago assessingtheimpactofcontextualinformationinhatespeechdetection AT zajacjoaquin assessingtheimpactofcontextualinformationinhatespeechdetection AT miguelpaula assessingtheimpactofcontextualinformationinhatespeechdetection AT debandinatalia assessingtheimpactofcontextualinformationinhatespeechdetection AT cotikviviana assessingtheimpactofcontextualinformationinhatespeechdetection |
_version_ |
1768086691085549568 |