Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled dat...
Guardado en:
Autores principales: | , |
---|---|
Formato: | Objeto de conferencia |
Lenguaje: | Inglés |
Publicado: |
2017
|
Materias: | |
Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/65941 http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/ASAI/asai-05.pdf |
Aporte de: |
id |
I19-R120-10915-65941 |
---|---|
record_format |
dspace |
institution |
Universidad Nacional de La Plata |
institution_str |
I-19 |
repository_str |
R-120 |
collection |
SEDICI (UNLP) |
language |
Inglés |
topic |
Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation |
spellingShingle |
Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation Cardellino, Cristian Alonso i Alemany, Laura Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
topic_facet |
Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation |
description |
This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD).
This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation. |
format |
Objeto de conferencia Objeto de conferencia |
author |
Cardellino, Cristian Alonso i Alemany, Laura |
author_facet |
Cardellino, Cristian Alonso i Alemany, Laura |
author_sort |
Cardellino, Cristian |
title |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
title_short |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
title_full |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
title_fullStr |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
title_full_unstemmed |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
title_sort |
disjoint semi-supervised spanish verb sense disambiguation using word embeddings |
publishDate |
2017 |
url |
http://sedici.unlp.edu.ar/handle/10915/65941 http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/ASAI/asai-05.pdf |
work_keys_str_mv |
AT cardellinocristian disjointsemisupervisedspanishverbsensedisambiguationusingwordembeddings AT alonsoialemanylaura disjointsemisupervisedspanishverbsensedisambiguationusingwordembeddings |
bdutipo_str |
Repositorios |
_version_ |
1764820480745799680 |