Inverted Index Entry Invalidation Strategy for Real Time Search

The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ríssola, Esteban A., Tolosa, Gabriel Hernán
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2015
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/50429
Aporte de:
id I19-R120-10915-50429
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
Real time
Sorting and searching
Index generation
spellingShingle Ciencias Informáticas
Real time
Sorting and searching
Index generation
Ríssola, Esteban A.
Tolosa, Gabriel Hernán
Inverted Index Entry Invalidation Strategy for Real Time Search
topic_facet Ciencias Informáticas
Real time
Sorting and searching
Index generation
description The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolu- tion of the underlying vocabulary and selectively invalidóte and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We study the dynamics of the vocabulary using a real dataset and also provide an evaluation of the proposed strategy using a search engine specifically designed for real-time indexing and search.
format Objeto de conferencia
Objeto de conferencia
author Ríssola, Esteban A.
Tolosa, Gabriel Hernán
author_facet Ríssola, Esteban A.
Tolosa, Gabriel Hernán
author_sort Ríssola, Esteban A.
title Inverted Index Entry Invalidation Strategy for Real Time Search
title_short Inverted Index Entry Invalidation Strategy for Real Time Search
title_full Inverted Index Entry Invalidation Strategy for Real Time Search
title_fullStr Inverted Index Entry Invalidation Strategy for Real Time Search
title_full_unstemmed Inverted Index Entry Invalidation Strategy for Real Time Search
title_sort inverted index entry invalidation strategy for real time search
publishDate 2015
url http://sedici.unlp.edu.ar/handle/10915/50429
work_keys_str_mv AT rissolaestebana invertedindexentryinvalidationstrategyforrealtimesearch
AT tolosagabrielhernan invertedindexentryinvalidationstrategyforrealtimesearch
bdutipo_str Repositorios
_version_ 1764820475019526145