Data Matching and Deduplication Over Big Data Using Hadoop Framework

Entity Resolution is the process of matching records from more than one database that refer to the same entity. In case of a single database the process is called deduplication. This article proposes a method to solve entity resolution and deduplication problem using MapReduce over Hadoop framework....

Descripción completa

Detalles Bibliográficos
Autores principales: Albanese, Pablo Adrián, Ale, Juan M.
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2016
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/56751
Aporte de:
id I19-R120-10915-56751
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
entity resolution
mapreduce
standard blocking
indexing
spellingShingle Ciencias Informáticas
entity resolution
mapreduce
standard blocking
indexing
Albanese, Pablo Adrián
Ale, Juan M.
Data Matching and Deduplication Over Big Data Using Hadoop Framework
topic_facet Ciencias Informáticas
entity resolution
mapreduce
standard blocking
indexing
description Entity Resolution is the process of matching records from more than one database that refer to the same entity. In case of a single database the process is called deduplication. This article proposes a method to solve entity resolution and deduplication problem using MapReduce over Hadoop framework. The proposed method includes data preprocessing, comparison and classification tasks indexing by standard blocking method. Our method can operate with one, two or more datasets and works with semi structured or structured data.
format Objeto de conferencia
Objeto de conferencia
author Albanese, Pablo Adrián
Ale, Juan M.
author_facet Albanese, Pablo Adrián
Ale, Juan M.
author_sort Albanese, Pablo Adrián
title Data Matching and Deduplication Over Big Data Using Hadoop Framework
title_short Data Matching and Deduplication Over Big Data Using Hadoop Framework
title_full Data Matching and Deduplication Over Big Data Using Hadoop Framework
title_fullStr Data Matching and Deduplication Over Big Data Using Hadoop Framework
title_full_unstemmed Data Matching and Deduplication Over Big Data Using Hadoop Framework
title_sort data matching and deduplication over big data using hadoop framework
publishDate 2016
url http://sedici.unlp.edu.ar/handle/10915/56751
work_keys_str_mv AT albanesepabloadrian datamatchinganddeduplicationoverbigdatausinghadoopframework
AT alejuanm datamatchinganddeduplicationoverbigdatausinghadoopframework
bdutipo_str Repositorios
_version_ 1764820477559177217