Speeding up the execution of a large number of statistical tests of independence

A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables...

Descripción completa

Detalles Bibliográficos
Autores principales: Schlüter, Federico, Bromberg, Facundo, Pérez, Diego Sebastián
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2010
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/152584
http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-05.pdf
Aporte de:
id I19-R120-10915-152584
record_format dspace
spelling I19-R120-10915-1525842023-05-08T20:04:12Z http://sedici.unlp.edu.ar/handle/10915/152584 http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-05.pdf issn:1850-2784 Speeding up the execution of a large number of statistical tests of independence Schlüter, Federico Bromberg, Facundo Pérez, Diego Sebastián 2010 2010 2023-05-08T17:27:53Z en Ciencias Informáticas statistical tests of independence contingency tables probabilistic graphical models structure learning A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints. Sociedad Argentina de Informática e Investigación Operativa Objeto de conferencia Objeto de conferencia http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf 48-59
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
statistical tests of independence
contingency tables
probabilistic graphical models
structure learning
spellingShingle Ciencias Informáticas
statistical tests of independence
contingency tables
probabilistic graphical models
structure learning
Schlüter, Federico
Bromberg, Facundo
Pérez, Diego Sebastián
Speeding up the execution of a large number of statistical tests of independence
topic_facet Ciencias Informáticas
statistical tests of independence
contingency tables
probabilistic graphical models
structure learning
description A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints.
format Objeto de conferencia
Objeto de conferencia
author Schlüter, Federico
Bromberg, Facundo
Pérez, Diego Sebastián
author_facet Schlüter, Federico
Bromberg, Facundo
Pérez, Diego Sebastián
author_sort Schlüter, Federico
title Speeding up the execution of a large number of statistical tests of independence
title_short Speeding up the execution of a large number of statistical tests of independence
title_full Speeding up the execution of a large number of statistical tests of independence
title_fullStr Speeding up the execution of a large number of statistical tests of independence
title_full_unstemmed Speeding up the execution of a large number of statistical tests of independence
title_sort speeding up the execution of a large number of statistical tests of independence
publishDate 2010
url http://sedici.unlp.edu.ar/handle/10915/152584
http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-05.pdf
work_keys_str_mv AT schluterfederico speedinguptheexecutionofalargenumberofstatisticaltestsofindependence
AT brombergfacundo speedinguptheexecutionofalargenumberofstatisticaltestsofindependence
AT perezdiegosebastian speedinguptheexecutionofalargenumberofstatisticaltestsofindependence
_version_ 1765660133827805184