A Spanish text corpus for the author profiling task

<i>Author Profiling</i> is the task of predicting characteristics of the author of a text, such as age, gender, personality, native language, etc. This is a task of growing importance due to its potential applications in security, crime and marketing, among others. One of the main diffic...

Descripción completa

Detalles Bibliográficos
Autores principales:	Villegas, María Paula, Garciarena Ucelay, María José, Errecalde, Marcelo Luis, Cagnina, Leticia
Formato:	Objeto de conferencia
Lenguaje:	Inglés
Publicado:	2014
Materias:	Ciencias Informáticas author profiling natural processing language Spanish text corpus
Acceso en línea:	http://sedici.unlp.edu.ar/handle/10915/42290
Aporte de:	SEDICI (UNLP) de Universidad Nacional de La Plata

id	I19-R120-10915-42290
record_format	dspace
institution	Universidad Nacional de La Plata
institution_str	I-19
repository_str	R-120
collection	SEDICI (UNLP)
language	Inglés
topic	Ciencias Informáticas author profiling natural processing language Spanish text corpus
spellingShingle	Ciencias Informáticas author profiling natural processing language Spanish text corpus Villegas, María Paula Garciarena Ucelay, María José Errecalde, Marcelo Luis Cagnina, Leticia A Spanish text corpus for the author profiling task
topic_facet	Ciencias Informáticas author profiling natural processing language Spanish text corpus
description	<i>Author Profiling</i> is the task of predicting characteristics of the author of a text, such as age, gender, personality, native language, etc. This is a task of growing importance due to its potential applications in security, crime and marketing, among others. One of the main difficulties in this field is the lack of reliable text collections (corpora) to train and test automatically derived classifiers, in particular in specific languages such as Spanish. Although some recent data sets were generated for the PAN competitions, these documents have a lot of “noise” that prevent researchers from obtaining more general conclusions about this task when more formal documents are used. In this context, this work proposes and describes <i>SpanText</i>, a data collection of formal texts in Spanish language which is, as far as we know, the first collection with these characteristics for the author profiling task. Besides, an experimental study is carried out where the difference in performance obtained with formal and informal texts is clearly established and opens interesting research lines to get a deeper understanding of the particularities that each type of documents poses to the author profiling task.
format	Objeto de conferencia Objeto de conferencia
author	Villegas, María Paula Garciarena Ucelay, María José Errecalde, Marcelo Luis Cagnina, Leticia
author_facet	Villegas, María Paula Garciarena Ucelay, María José Errecalde, Marcelo Luis Cagnina, Leticia
author_sort	Villegas, María Paula
title	A Spanish text corpus for the author profiling task
title_short	A Spanish text corpus for the author profiling task
title_full	A Spanish text corpus for the author profiling task
title_fullStr	A Spanish text corpus for the author profiling task
title_full_unstemmed	A Spanish text corpus for the author profiling task
title_sort	spanish text corpus for the author profiling task
publishDate	2014
url	http://sedici.unlp.edu.ar/handle/10915/42290
work_keys_str_mv	AT villegasmariapaula aspanishtextcorpusfortheauthorprofilingtask AT garciarenaucelaymariajose aspanishtextcorpusfortheauthorprofilingtask AT errecaldemarceloluis aspanishtextcorpusfortheauthorprofilingtask AT cagninaleticia aspanishtextcorpusfortheauthorprofilingtask AT villegasmariapaula spanishtextcorpusfortheauthorprofilingtask AT garciarenaucelaymariajose spanishtextcorpusfortheauthorprofilingtask AT errecaldemarceloluis spanishtextcorpusfortheauthorprofilingtask AT cagninaleticia spanishtextcorpusfortheauthorprofilingtask
bdutipo_str	Repositorios
_version_	1764820473525305345

A Spanish text corpus for the author profiling task

Ejemplares similares