Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings

This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled dat...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Cardellino, Cristian, Alonso i Alemany, Laura
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2017
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/65941
http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/ASAI/asai-05.pdf
Aporte de:
id I19-R120-10915-65941
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
word embeddings
disjoint semisupervised learning
verb sense disambiguation
spellingShingle Ciencias Informáticas
word embeddings
disjoint semisupervised learning
verb sense disambiguation
Cardellino, Cristian
Alonso i Alemany, Laura
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
topic_facet Ciencias Informáticas
word embeddings
disjoint semisupervised learning
verb sense disambiguation
description This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation.
format Objeto de conferencia
Objeto de conferencia
author Cardellino, Cristian
Alonso i Alemany, Laura
author_facet Cardellino, Cristian
Alonso i Alemany, Laura
author_sort Cardellino, Cristian
title Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_short Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_full Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_fullStr Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_full_unstemmed Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_sort disjoint semi-supervised spanish verb sense disambiguation using word embeddings
publishDate 2017
url http://sedici.unlp.edu.ar/handle/10915/65941
http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/ASAI/asai-05.pdf
work_keys_str_mv AT cardellinocristian disjointsemisupervisedspanishverbsensedisambiguationusingwordembeddings
AT alonsoialemanylaura disjointsemisupervisedspanishverbsensedisambiguationusingwordembeddings
bdutipo_str Repositorios
_version_ 1764820480745799680