Inverted Index Entry Invalidation Strategy for Real Time Search
The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to...
Guardado en:
Autores principales: | , |
---|---|
Formato: | Objeto de conferencia |
Lenguaje: | Inglés |
Publicado: |
2015
|
Materias: | |
Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/50429 |
Aporte de: |
id |
I19-R120-10915-50429 |
---|---|
record_format |
dspace |
institution |
Universidad Nacional de La Plata |
institution_str |
I-19 |
repository_str |
R-120 |
collection |
SEDICI (UNLP) |
language |
Inglés |
topic |
Ciencias Informáticas Real time Sorting and searching Index generation |
spellingShingle |
Ciencias Informáticas Real time Sorting and searching Index generation Ríssola, Esteban A. Tolosa, Gabriel Hernán Inverted Index Entry Invalidation Strategy for Real Time Search |
topic_facet |
Ciencias Informáticas Real time Sorting and searching Index generation |
description |
The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolu- tion of the underlying vocabulary and selectively invalidóte and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We study the dynamics of the vocabulary using a real dataset and also provide an evaluation of the proposed strategy using a search engine specifically designed for real-time indexing and search. |
format |
Objeto de conferencia Objeto de conferencia |
author |
Ríssola, Esteban A. Tolosa, Gabriel Hernán |
author_facet |
Ríssola, Esteban A. Tolosa, Gabriel Hernán |
author_sort |
Ríssola, Esteban A. |
title |
Inverted Index Entry Invalidation Strategy for Real Time Search |
title_short |
Inverted Index Entry Invalidation Strategy for Real Time Search |
title_full |
Inverted Index Entry Invalidation Strategy for Real Time Search |
title_fullStr |
Inverted Index Entry Invalidation Strategy for Real Time Search |
title_full_unstemmed |
Inverted Index Entry Invalidation Strategy for Real Time Search |
title_sort |
inverted index entry invalidation strategy for real time search |
publishDate |
2015 |
url |
http://sedici.unlp.edu.ar/handle/10915/50429 |
work_keys_str_mv |
AT rissolaestebana invertedindexentryinvalidationstrategyforrealtimesearch AT tolosagabrielhernan invertedindexentryinvalidationstrategyforrealtimesearch |
bdutipo_str |
Repositorios |
_version_ |
1764820475019526145 |