gdup: a big graph entity deduplication system
- 1. ISTI-CNR
- 2. University of Pisa
Description
The GDup Software enables an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup supports practitioners with the functionalities needed to realize a fully-fledged entity deduplication workflow over a generic input graph, inclusive of Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph.
This software module is the core library implementing the clustering functions, the comparators, and the configuration language that users can exploit to define the matching criteria.
GDup is today one of the core components of the OpenAIRE infrastructure production system, monitoring Open Science trends on behalf of the European Commission. The Software is the outcome of Claudio Atzori’s PhD investigations on “deduplication of knowledge graphs”.
Files
gdup-4.0.5.zip
Files
(3.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:35661bef137374bb4b3b7fbb76f52093
|
3.2 MB | Preview Download |