Published February 17, 2017 | Version 4.0.5
Software Open

gdup: a big graph entity deduplication system

  • 1. ISTI-CNR

Contributors

Researcher:

  • 1. ISTI-CNR
  • 2. University of Pisa

Description

The GDup Software enables an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup supports practitioners with the functionalities needed to realize a fully-fledged entity deduplication workflow over a generic input graph, inclusive of Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph.

This software module is the core library implementing the clustering functions, the comparators, and the configuration language that users can exploit to define the matching criteria.

GDup is today one of the core components of the OpenAIRE infrastructure production system, monitoring Open Science trends on behalf of the European Commission. The Software is the outcome of Claudio Atzori’s PhD investigations on “deduplication of knowledge graphs”.

Files

gdup-4.0.5.zip

Files (3.2 MB)

Name Size Download all
md5:35661bef137374bb4b3b7fbb76f52093
3.2 MB Preview Download

Additional details

Funding

European Commission
OpenAIRE2020 - Open Access Infrastructure for Research in Europe 2020 643410