Extract, transform and load architecture for metadata collection

Digital repositories acting as resource aggregators typically face different challenges, roughly classified in three main categories: extraction, improvement and storage. The first category comprises issues related to dealing with different resource collection protocols: OAI-PMH, web-crawling, webse...

Descripción completa

Detalles Bibliográficos
Autores principales: De Giusti, Marisa Raquel, Lira, Ariel Jorge, Oviedo, Néstor Fabián
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2011
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/5529
Aporte de:
id I19-R120-10915-5529
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
Bibliotecología
repositories
aggregation
harvesting
datawarehousing
data integration
Búsqueda y recuperación de información
Aplicaciones de los Sistemas de Información
spellingShingle Ciencias Informáticas
Bibliotecología
repositories
aggregation
harvesting
datawarehousing
data integration
Búsqueda y recuperación de información
Aplicaciones de los Sistemas de Información
De Giusti, Marisa Raquel
Lira, Ariel Jorge
Oviedo, Néstor Fabián
Extract, transform and load architecture for metadata collection
topic_facet Ciencias Informáticas
Bibliotecología
repositories
aggregation
harvesting
datawarehousing
data integration
Búsqueda y recuperación de información
Aplicaciones de los Sistemas de Información
description Digital repositories acting as resource aggregators typically face different challenges, roughly classified in three main categories: extraction, improvement and storage. The first category comprises issues related to dealing with different resource collection protocols: OAI-PMH, web-crawling, webservices, etc and their representation: XML, HTML, database tuples, unstructured documents, etc. The second category comprises information improvements based on controlled vocabularies, specific date formats, correction of malformed data, etc. Finally, the third category deals with the destination of downloaded resources: unification into a common database, sorting by certain criteria, etc. This paper proposes an ETL architecture for designing a software application that provides a comprehensive solution to challenges posed by a digital repository as resource aggregator. Design and implementation aspects considered during the development of this tool are described, focusing especially on architecture highlights.
format Objeto de conferencia
Objeto de conferencia
author De Giusti, Marisa Raquel
Lira, Ariel Jorge
Oviedo, Néstor Fabián
author_facet De Giusti, Marisa Raquel
Lira, Ariel Jorge
Oviedo, Néstor Fabián
author_sort De Giusti, Marisa Raquel
title Extract, transform and load architecture for metadata collection
title_short Extract, transform and load architecture for metadata collection
title_full Extract, transform and load architecture for metadata collection
title_fullStr Extract, transform and load architecture for metadata collection
title_full_unstemmed Extract, transform and load architecture for metadata collection
title_sort extract, transform and load architecture for metadata collection
publishDate 2011
url http://sedici.unlp.edu.ar/handle/10915/5529
work_keys_str_mv AT degiustimarisaraquel extracttransformandloadarchitectureformetadatacollection
AT liraarieljorge extracttransformandloadarchitectureformetadatacollection
AT oviedonestorfabian extracttransformandloadarchitectureformetadatacollection
AT degiustimarisaraquel arquitecturaetlparalarecolecciondemetadatos
AT liraarieljorge arquitecturaetlparalarecolecciondemetadatos
AT oviedonestorfabian arquitecturaetlparalarecolecciondemetadatos
bdutipo_str Repositorios
_version_ 1764820476795813893