Data Matching and Deduplication Over Big Data Using Hadoop Framework
Entity Resolution is the process of matching records from more than one database that refer to the same entity. In case of a single database the process is called deduplication. This article proposes a method to solve entity resolution and deduplication problem using MapReduce over Hadoop framework....
Autores principales: | , |
---|---|
Formato: | Objeto de conferencia |
Lenguaje: | Inglés |
Publicado: |
2016
|
Materias: | |
Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/56751 |
Aporte de: |
id |
I19-R120-10915-56751 |
---|---|
record_format |
dspace |
institution |
Universidad Nacional de La Plata |
institution_str |
I-19 |
repository_str |
R-120 |
collection |
SEDICI (UNLP) |
language |
Inglés |
topic |
Ciencias Informáticas entity resolution mapreduce standard blocking indexing |
spellingShingle |
Ciencias Informáticas entity resolution mapreduce standard blocking indexing Albanese, Pablo Adrián Ale, Juan M. Data Matching and Deduplication Over Big Data Using Hadoop Framework |
topic_facet |
Ciencias Informáticas entity resolution mapreduce standard blocking indexing |
description |
Entity Resolution is the process of matching records from more than one database that refer to the same entity. In case of a single database the process is called deduplication. This article proposes a method to solve entity resolution and deduplication problem using MapReduce over Hadoop framework. The proposed method includes data preprocessing, comparison and classification tasks indexing by standard blocking method. Our method can operate with one, two or more datasets and works with semi structured or structured data. |
format |
Objeto de conferencia Objeto de conferencia |
author |
Albanese, Pablo Adrián Ale, Juan M. |
author_facet |
Albanese, Pablo Adrián Ale, Juan M. |
author_sort |
Albanese, Pablo Adrián |
title |
Data Matching and Deduplication Over Big Data Using Hadoop Framework |
title_short |
Data Matching and Deduplication Over Big Data Using Hadoop Framework |
title_full |
Data Matching and Deduplication Over Big Data Using Hadoop Framework |
title_fullStr |
Data Matching and Deduplication Over Big Data Using Hadoop Framework |
title_full_unstemmed |
Data Matching and Deduplication Over Big Data Using Hadoop Framework |
title_sort |
data matching and deduplication over big data using hadoop framework |
publishDate |
2016 |
url |
http://sedici.unlp.edu.ar/handle/10915/56751 |
work_keys_str_mv |
AT albanesepabloadrian datamatchinganddeduplicationoverbigdatausinghadoopframework AT alejuanm datamatchinganddeduplicationoverbigdatausinghadoopframework |
bdutipo_str |
Repositorios |
_version_ |
1764820477559177217 |