Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings

This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled dat...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Cardellino, Cristian, Alonso i Alemany, Laura
Formato:	Objeto de conferencia
Lenguaje:	Inglés
Publicado:	2017
Materias:	Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation
Acceso en línea:	http://sedici.unlp.edu.ar/handle/10915/65941 http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/ASAI/asai-05.pdf
Aporte de:	SEDICI (UNLP) de Universidad Nacional de La Plata

id	I19-R120-10915-65941
record_format	dspace
institution	Universidad Nacional de La Plata
institution_str	I-19
repository_str	R-120
collection	SEDICI (UNLP)
language	Inglés
topic	Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation
spellingShingle	Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation Cardellino, Cristian Alonso i Alemany, Laura Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
topic_facet	Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation
description	This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation.
format	Objeto de conferencia Objeto de conferencia
author	Cardellino, Cristian Alonso i Alemany, Laura
author_facet	Cardellino, Cristian Alonso i Alemany, Laura
author_sort	Cardellino, Cristian
title	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_short	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_full	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_fullStr	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_full_unstemmed	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_sort	disjoint semi-supervised spanish verb sense disambiguation using word embeddings
publishDate	2017
url	http://sedici.unlp.edu.ar/handle/10915/65941 http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/ASAI/asai-05.pdf
work_keys_str_mv	AT cardellinocristian disjointsemisupervisedspanishverbsensedisambiguationusingwordembeddings AT alonsoialemanylaura disjointsemisupervisedspanishverbsensedisambiguationusingwordembeddings
bdutipo_str	Repositorios
_version_	1764820480745799680

Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings

Ejemplares similares