Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices

The problem of incomplete data matrices is repeatedly found in large databases; posing a significant obstacle for an effective treatment of data. This paper examines a self-organizing-map (SOM) based method of data imputation under the concept of distance object per one weight; to predict physicoche...

Descripción completa

Detalles Bibliográficos
Autores principales: Folguera, Laura, Zupan, Jure, Cicerone, Daniel, Magallanes, Jorge
Formato: publishedVersion Artículo
Lenguaje:Inglés
Publicado: Elsevier Science Bv 2015
Materias:
Acceso en línea:https://ri.unsam.edu.ar/handle/123456789/1009
Aporte de:
id I78-R216-123456789-1009
record_format dspace
spelling I78-R216-123456789-10092023-03-27T21:04:50Z Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices Folguera, Laura Zupan, Jure Cicerone, Daniel Magallanes, Jorge CHEMOMETRICS ARTIFICIAL NEURAL NETWORK SELF-ORGANIZING MAPS MISSING DATA IMPUTATION ENVIRONMENTAL DATA SET CIENCIAS QUÍMICAS CIENCIAS EXACTAS Y NATURALES info:eu-repo/semantics/publishedVersion The problem of incomplete data matrices is repeatedly found in large databases; posing a significant obstacle for an effective treatment of data. This paper examines a self-organizing-map (SOM) based method of data imputation under the concept of distance object per one weight; to predict physicochemical parameters of water samples in a data set where concentrations of different analytes were missed. The method was evaluated according to two different possibilities: (a) including vectors of samples with and without missing data in the training data set and (b) pre-training a SOM for a data set with no missing values and then making imputations for a second data set (prediction set) of samples with missing values. Evaluations were made using a surface water data set of 270 samples from Reconquista River; in Buenos Aires Province; Argentina; by artificially setting a range of 17% to 39% of the data to missing. Results were compared to imputations made through professional criteria. SOMs gave reasonable estimates; with no statistically significant differences from estimates made through professional criteria; proving thus to be a suitable time-saving imputation method. Fil: Laura Folguera. Universidad Nacional de San Martín. Instituto de Investigación e Ingeniería Ambiental; Buenos Aires. Argentina. Fil: Jure Zupan. National Institute of Chemistry; Ljubljana. Slovenia. Fil: Daniel Cicerone. Universidad Nacional de San Martín. Instituto de Investigación e Ingeniería Ambiental; Buenos Aires. Argentina. Fil: Jorge Magallanes. Universidad Nacional de San Martín. Instituto de Investigación e Ingeniería Ambiental; Buenos Aires. Argentina. 2015-03 info:eu-repo/semantics/article info:ar-repo/semantics/artículo Folguera, L. et al (2015). Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices. En: Chemometrics and Intelligent Laboratory Systems. Elsevier Science 143, 146-151 0169-7439 https://ri.unsam.edu.ar/handle/123456789/1009 eng info:eu-repo/semantics/restrictedAccess http://creativecommons.org/licenses/by-nc-sa/2.5/ar/ Creative Commons Atribución-NoComercial-CompartirIgual 2.5 Argentina (CC BY-NC-SA 2.5) application/pdf pp. 146-151 application/pdf Elsevier Science Bv Chemometrics and Intelligent Laboratory Systems. 143: 146-151 (2015) Elsevier B.V. http://dx.doi.org/10.1016/j.chemolab.2015.03.002
institution Universidad Nacional de General San Martín
institution_str I-78
repository_str R-216
collection Repositorio Institucional de la UNSAM
language Inglés
topic CHEMOMETRICS
ARTIFICIAL NEURAL NETWORK
SELF-ORGANIZING MAPS
MISSING DATA IMPUTATION
ENVIRONMENTAL DATA SET
CIENCIAS QUÍMICAS
CIENCIAS EXACTAS Y NATURALES
spellingShingle CHEMOMETRICS
ARTIFICIAL NEURAL NETWORK
SELF-ORGANIZING MAPS
MISSING DATA IMPUTATION
ENVIRONMENTAL DATA SET
CIENCIAS QUÍMICAS
CIENCIAS EXACTAS Y NATURALES
Folguera, Laura
Zupan, Jure
Cicerone, Daniel
Magallanes, Jorge
Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices
topic_facet CHEMOMETRICS
ARTIFICIAL NEURAL NETWORK
SELF-ORGANIZING MAPS
MISSING DATA IMPUTATION
ENVIRONMENTAL DATA SET
CIENCIAS QUÍMICAS
CIENCIAS EXACTAS Y NATURALES
description The problem of incomplete data matrices is repeatedly found in large databases; posing a significant obstacle for an effective treatment of data. This paper examines a self-organizing-map (SOM) based method of data imputation under the concept of distance object per one weight; to predict physicochemical parameters of water samples in a data set where concentrations of different analytes were missed. The method was evaluated according to two different possibilities: (a) including vectors of samples with and without missing data in the training data set and (b) pre-training a SOM for a data set with no missing values and then making imputations for a second data set (prediction set) of samples with missing values. Evaluations were made using a surface water data set of 270 samples from Reconquista River; in Buenos Aires Province; Argentina; by artificially setting a range of 17% to 39% of the data to missing. Results were compared to imputations made through professional criteria. SOMs gave reasonable estimates; with no statistically significant differences from estimates made through professional criteria; proving thus to be a suitable time-saving imputation method.
format publishedVersion
Artículo
Artículo
author Folguera, Laura
Zupan, Jure
Cicerone, Daniel
Magallanes, Jorge
author_facet Folguera, Laura
Zupan, Jure
Cicerone, Daniel
Magallanes, Jorge
author_sort Folguera, Laura
title Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices
title_short Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices
title_full Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices
title_fullStr Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices
title_full_unstemmed Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices
title_sort self-organizing maps for imputation of missing data in incomplete data matrices
publisher Elsevier Science Bv
publishDate 2015
url https://ri.unsam.edu.ar/handle/123456789/1009
work_keys_str_mv AT folgueralaura selforganizingmapsforimputationofmissingdatainincompletedatamatrices
AT zupanjure selforganizingmapsforimputationofmissingdatainincompletedatamatrices
AT ciceronedaniel selforganizingmapsforimputationofmissingdatainincompletedatamatrices
AT magallanesjorge selforganizingmapsforimputationofmissingdatainincompletedatamatrices
_version_ 1765722562247000064