Speeding up the execution of a large number of statistical tests of independence
A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables...
Autores principales: | , , |
---|---|
Formato: | Objeto de conferencia |
Lenguaje: | Inglés |
Publicado: |
2010
|
Materias: | |
Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/152584 http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-05.pdf |
Aporte de: |
id |
I19-R120-10915-152584 |
---|---|
record_format |
dspace |
spelling |
I19-R120-10915-1525842023-05-08T20:04:12Z http://sedici.unlp.edu.ar/handle/10915/152584 http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-05.pdf issn:1850-2784 Speeding up the execution of a large number of statistical tests of independence Schlüter, Federico Bromberg, Facundo Pérez, Diego Sebastián 2010 2010 2023-05-08T17:27:53Z en Ciencias Informáticas statistical tests of independence contingency tables probabilistic graphical models structure learning A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints. Sociedad Argentina de Informática e Investigación Operativa Objeto de conferencia Objeto de conferencia http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf 48-59 |
institution |
Universidad Nacional de La Plata |
institution_str |
I-19 |
repository_str |
R-120 |
collection |
SEDICI (UNLP) |
language |
Inglés |
topic |
Ciencias Informáticas statistical tests of independence contingency tables probabilistic graphical models structure learning |
spellingShingle |
Ciencias Informáticas statistical tests of independence contingency tables probabilistic graphical models structure learning Schlüter, Federico Bromberg, Facundo Pérez, Diego Sebastián Speeding up the execution of a large number of statistical tests of independence |
topic_facet |
Ciencias Informáticas statistical tests of independence contingency tables probabilistic graphical models structure learning |
description |
A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints. |
format |
Objeto de conferencia Objeto de conferencia |
author |
Schlüter, Federico Bromberg, Facundo Pérez, Diego Sebastián |
author_facet |
Schlüter, Federico Bromberg, Facundo Pérez, Diego Sebastián |
author_sort |
Schlüter, Federico |
title |
Speeding up the execution of a large number of statistical tests of independence |
title_short |
Speeding up the execution of a large number of statistical tests of independence |
title_full |
Speeding up the execution of a large number of statistical tests of independence |
title_fullStr |
Speeding up the execution of a large number of statistical tests of independence |
title_full_unstemmed |
Speeding up the execution of a large number of statistical tests of independence |
title_sort |
speeding up the execution of a large number of statistical tests of independence |
publishDate |
2010 |
url |
http://sedici.unlp.edu.ar/handle/10915/152584 http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-05.pdf |
work_keys_str_mv |
AT schluterfederico speedinguptheexecutionofalargenumberofstatisticaltestsofindependence AT brombergfacundo speedinguptheexecutionofalargenumberofstatisticaltestsofindependence AT perezdiegosebastian speedinguptheexecutionofalargenumberofstatisticaltestsofindependence |
_version_ |
1765660133827805184 |