Multivariate location and scatter matrix estimation under cellwise and casewise contamination

Real data may contain both cellwise outliers and casewise outliers. There is a vast literature on robust estimation for casewise outliers, but only a scant literature for cellwise outliers and almost none for both types of outliers. Estimation of multivariate location and scatter matrix is a corner...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Leung, A., Yohai, V., Zamar, R.
Formato: JOUR
Materias:
Acceso en línea:http://hdl.handle.net/20.500.12110/paper_01679473_v111_n_p59_Leung
Aporte de:
id todo:paper_01679473_v111_n_p59_Leung
record_format dspace
spelling todo:paper_01679473_v111_n_p59_Leung2023-10-03T15:05:34Z Multivariate location and scatter matrix estimation under cellwise and casewise contamination Leung, A. Yohai, V. Zamar, R. Cellwise outliers Componentwise contamination Multivariate location and scatter Robust estimation Location Matrix algebra Multivariant analysis Cellwise outliers Componentwise Multivariate data analysis Robust estimation Robust procedures Simulation studies Two-step approach Two-step procedure Statistics Real data may contain both cellwise outliers and casewise outliers. There is a vast literature on robust estimation for casewise outliers, but only a scant literature for cellwise outliers and almost none for both types of outliers. Estimation of multivariate location and scatter matrix is a corner stone in multivariate data analysis. A two-step approach was recently proposed to perform robust estimation of multivariate location and scatter matrix in the presence of cellwise and casewise outliers. In the first step a univariate filter was applied to remove cellwise outliers. In the second step a generalized S-estimator was used to downweight casewise outliers. This proposal can be further improved in three main directions. First, through the introduction of a consistent bivariate filter to be used in combination with the univariate filter in the first step. Second, through the proposal of a new fast subsampling procedure to generate starting points for the generalized S-estimator in the second step. Third, through the use of a non-monotonic weight function for the generalized S-estimator to better handle casewise outliers in high dimension. A simulation study and a real data example show that, unlike the original two-step procedure, the modified two-step approach performs and scales well in high dimension. Moreover, they show that the modified procedure outperforms the original one and other state-of-the-art robust procedures under cellwise and casewise data contamination. © 2017 Elsevier B.V. JOUR info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_01679473_v111_n_p59_Leung
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic Cellwise outliers
Componentwise contamination
Multivariate location and scatter
Robust estimation
Location
Matrix algebra
Multivariant analysis
Cellwise outliers
Componentwise
Multivariate data analysis
Robust estimation
Robust procedures
Simulation studies
Two-step approach
Two-step procedure
Statistics
spellingShingle Cellwise outliers
Componentwise contamination
Multivariate location and scatter
Robust estimation
Location
Matrix algebra
Multivariant analysis
Cellwise outliers
Componentwise
Multivariate data analysis
Robust estimation
Robust procedures
Simulation studies
Two-step approach
Two-step procedure
Statistics
Leung, A.
Yohai, V.
Zamar, R.
Multivariate location and scatter matrix estimation under cellwise and casewise contamination
topic_facet Cellwise outliers
Componentwise contamination
Multivariate location and scatter
Robust estimation
Location
Matrix algebra
Multivariant analysis
Cellwise outliers
Componentwise
Multivariate data analysis
Robust estimation
Robust procedures
Simulation studies
Two-step approach
Two-step procedure
Statistics
description Real data may contain both cellwise outliers and casewise outliers. There is a vast literature on robust estimation for casewise outliers, but only a scant literature for cellwise outliers and almost none for both types of outliers. Estimation of multivariate location and scatter matrix is a corner stone in multivariate data analysis. A two-step approach was recently proposed to perform robust estimation of multivariate location and scatter matrix in the presence of cellwise and casewise outliers. In the first step a univariate filter was applied to remove cellwise outliers. In the second step a generalized S-estimator was used to downweight casewise outliers. This proposal can be further improved in three main directions. First, through the introduction of a consistent bivariate filter to be used in combination with the univariate filter in the first step. Second, through the proposal of a new fast subsampling procedure to generate starting points for the generalized S-estimator in the second step. Third, through the use of a non-monotonic weight function for the generalized S-estimator to better handle casewise outliers in high dimension. A simulation study and a real data example show that, unlike the original two-step procedure, the modified two-step approach performs and scales well in high dimension. Moreover, they show that the modified procedure outperforms the original one and other state-of-the-art robust procedures under cellwise and casewise data contamination. © 2017 Elsevier B.V.
format JOUR
author Leung, A.
Yohai, V.
Zamar, R.
author_facet Leung, A.
Yohai, V.
Zamar, R.
author_sort Leung, A.
title Multivariate location and scatter matrix estimation under cellwise and casewise contamination
title_short Multivariate location and scatter matrix estimation under cellwise and casewise contamination
title_full Multivariate location and scatter matrix estimation under cellwise and casewise contamination
title_fullStr Multivariate location and scatter matrix estimation under cellwise and casewise contamination
title_full_unstemmed Multivariate location and scatter matrix estimation under cellwise and casewise contamination
title_sort multivariate location and scatter matrix estimation under cellwise and casewise contamination
url http://hdl.handle.net/20.500.12110/paper_01679473_v111_n_p59_Leung
work_keys_str_mv AT leunga multivariatelocationandscattermatrixestimationundercellwiseandcasewisecontamination
AT yohaiv multivariatelocationandscattermatrixestimationundercellwiseandcasewisecontamination
AT zamarr multivariatelocationandscattermatrixestimationundercellwiseandcasewisecontamination
_version_ 1782029333968191488