Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

To Compare or Not to Compare: Making Entity Resolution more Efficient

  • George Papadakis*
  • , Ekaterini Ioannou
  • , Claudia Niederée
  • , Themis Palpanas
  • , Wolfgang Nejdl
  • *Korrespondierende*r Autor*in für diese Arbeit

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Abstract

Blocking methods are crucial for making the inherently quadratic task of Entity Resolution more efficient. The blocking methods proposed in the literature rely on the homogeneity of data and the availability of binding schema information; thus, they are inapplicable to the voluminous, noisy, and highly heterogeneous data of the Web 2.0 user-generated content. To deal with such data, attribute-agnostic blocking has been recently introduced, following a two-fold strategy: the first layer places entities into overlapping blocks in order to achieve high effectiveness, while the second layer reduces the number of unnecessary comparisons in order to enhance efficiency. In this paper, we present a set of techniques that can be plugged into the second strategy layer of attribute-agnostic blocking to further improve its efficiency. We introduce a technique that eliminates redundant comparisons, and, based on this, we incorporate an approximate method for pruning comparisons that are highly likely to involve non-matching entities. We also introduce a novel measure for quantifying the redundancy a blocking method entails and explain how it can be used to a-priori tune the process of comparisons pruning. We apply our blocking techniques on two large, real-world data sets and report results that demonstrate a substantial increase in efficiency at a negligible (if any) cost in effectiveness.

OriginalspracheEnglisch
Titel des SammelwerksProceedings of the International Workshop on Semantic Web Information Management, SWIM 2011
DOIs
PublikationsstatusVeröffentlicht - 12 Juni 2011
Veranstaltung3rd International Workshop on Semantic Web Information Management, SWIM 2011 - Athens, Griechenland
Dauer: 12 Juni 201116 Juni 2011

Publikationsreihe

NameProceedings of the International Workshop on Semantic Web Information Management, SWIM 2011

Konferenz

Konferenz3rd International Workshop on Semantic Web Information Management, SWIM 2011
Land/GebietGriechenland
OrtAthens
Zeitraum12 Juni 201116 Juni 2011

ASJC Scopus Sachgebiete

  • Computernetzwerke und -kommunikation
  • Informationssysteme und -management

Dieses zitieren