A Spanish text corpus for the author profiling task

<i>Author Profiling</i> is the task of predicting characteristics of the author of a text, such as age, gender, personality, native language, etc. This is a task of growing importance due to its potential applications in security, crime and marketing, among others. One of the main diffic...

Descripción completa

Detalles Bibliográficos
Autores principales: Villegas, María Paula, Garciarena Ucelay, María José, Errecalde, Marcelo Luis, Cagnina, Leticia
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2014
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/42290
Aporte de:
id I19-R120-10915-42290
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
author profiling
natural processing language
Spanish text corpus
spellingShingle Ciencias Informáticas
author profiling
natural processing language
Spanish text corpus
Villegas, María Paula
Garciarena Ucelay, María José
Errecalde, Marcelo Luis
Cagnina, Leticia
A Spanish text corpus for the author profiling task
topic_facet Ciencias Informáticas
author profiling
natural processing language
Spanish text corpus
description <i>Author Profiling</i> is the task of predicting characteristics of the author of a text, such as age, gender, personality, native language, etc. This is a task of growing importance due to its potential applications in security, crime and marketing, among others. One of the main difficulties in this field is the lack of reliable text collections (corpora) to train and test automatically derived classifiers, in particular in specific languages such as Spanish. Although some recent data sets were generated for the PAN competitions, these documents have a lot of “noise” that prevent researchers from obtaining more general conclusions about this task when more formal documents are used. In this context, this work proposes and describes <i>SpanText</i>, a data collection of formal texts in Spanish language which is, as far as we know, the first collection with these characteristics for the author profiling task. Besides, an experimental study is carried out where the difference in performance obtained with formal and informal texts is clearly established and opens interesting research lines to get a deeper understanding of the particularities that each type of documents poses to the author profiling task.
format Objeto de conferencia
Objeto de conferencia
author Villegas, María Paula
Garciarena Ucelay, María José
Errecalde, Marcelo Luis
Cagnina, Leticia
author_facet Villegas, María Paula
Garciarena Ucelay, María José
Errecalde, Marcelo Luis
Cagnina, Leticia
author_sort Villegas, María Paula
title A Spanish text corpus for the author profiling task
title_short A Spanish text corpus for the author profiling task
title_full A Spanish text corpus for the author profiling task
title_fullStr A Spanish text corpus for the author profiling task
title_full_unstemmed A Spanish text corpus for the author profiling task
title_sort spanish text corpus for the author profiling task
publishDate 2014
url http://sedici.unlp.edu.ar/handle/10915/42290
work_keys_str_mv AT villegasmariapaula aspanishtextcorpusfortheauthorprofilingtask
AT garciarenaucelaymariajose aspanishtextcorpusfortheauthorprofilingtask
AT errecaldemarceloluis aspanishtextcorpusfortheauthorprofilingtask
AT cagninaleticia aspanishtextcorpusfortheauthorprofilingtask
AT villegasmariapaula spanishtextcorpusfortheauthorprofilingtask
AT garciarenaucelaymariajose spanishtextcorpusfortheauthorprofilingtask
AT errecaldemarceloluis spanishtextcorpusfortheauthorprofilingtask
AT cagninaleticia spanishtextcorpusfortheauthorprofilingtask
bdutipo_str Repositorios
_version_ 1764820473525305345