Extract, transform and load architecture for metadata collection
Digital repositories acting as resource aggregators typically face different challenges, roughly classified in three main categories: extraction, improvement and storage. The first category comprises issues related to dealing with different resource collection protocols: OAI-PMH, web-crawling, webse...
Autores principales: | , , |
---|---|
Formato: | Objeto de conferencia |
Lenguaje: | Inglés |
Publicado: |
2011
|
Materias: | |
Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/5529 |
Aporte de: |
id |
I19-R120-10915-5529 |
---|---|
record_format |
dspace |
institution |
Universidad Nacional de La Plata |
institution_str |
I-19 |
repository_str |
R-120 |
collection |
SEDICI (UNLP) |
language |
Inglés |
topic |
Ciencias Informáticas Bibliotecología repositories aggregation harvesting datawarehousing data integration Búsqueda y recuperación de información Aplicaciones de los Sistemas de Información |
spellingShingle |
Ciencias Informáticas Bibliotecología repositories aggregation harvesting datawarehousing data integration Búsqueda y recuperación de información Aplicaciones de los Sistemas de Información De Giusti, Marisa Raquel Lira, Ariel Jorge Oviedo, Néstor Fabián Extract, transform and load architecture for metadata collection |
topic_facet |
Ciencias Informáticas Bibliotecología repositories aggregation harvesting datawarehousing data integration Búsqueda y recuperación de información Aplicaciones de los Sistemas de Información |
description |
Digital repositories acting as resource aggregators typically face different challenges, roughly classified in three main categories: extraction, improvement and storage. The first category comprises issues related to dealing with different resource collection protocols: OAI-PMH, web-crawling, webservices, etc and their representation: XML, HTML, database tuples, unstructured documents, etc. The second category comprises information improvements based on controlled vocabularies, specific date formats, correction of malformed data, etc. Finally, the third category deals with the destination of downloaded resources: unification into a common database, sorting by certain criteria, etc.
This paper proposes an ETL architecture for designing a software application that provides a comprehensive solution to challenges posed by a digital repository as resource aggregator.
Design and implementation aspects considered during the development of this tool are described, focusing especially on architecture highlights. |
format |
Objeto de conferencia Objeto de conferencia |
author |
De Giusti, Marisa Raquel Lira, Ariel Jorge Oviedo, Néstor Fabián |
author_facet |
De Giusti, Marisa Raquel Lira, Ariel Jorge Oviedo, Néstor Fabián |
author_sort |
De Giusti, Marisa Raquel |
title |
Extract, transform and load architecture for metadata collection |
title_short |
Extract, transform and load architecture for metadata collection |
title_full |
Extract, transform and load architecture for metadata collection |
title_fullStr |
Extract, transform and load architecture for metadata collection |
title_full_unstemmed |
Extract, transform and load architecture for metadata collection |
title_sort |
extract, transform and load architecture for metadata collection |
publishDate |
2011 |
url |
http://sedici.unlp.edu.ar/handle/10915/5529 |
work_keys_str_mv |
AT degiustimarisaraquel extracttransformandloadarchitectureformetadatacollection AT liraarieljorge extracttransformandloadarchitectureformetadatacollection AT oviedonestorfabian extracttransformandloadarchitectureformetadatacollection AT degiustimarisaraquel arquitecturaetlparalarecolecciondemetadatos AT liraarieljorge arquitecturaetlparalarecolecciondemetadatos AT oviedonestorfabian arquitecturaetlparalarecolecciondemetadatos |
bdutipo_str |
Repositorios |
_version_ |
1764820476795813893 |