Published May 12, 2021 | Version 3.1.8
Software Open

OpenAIRE OpenOrgs Database

Description

In the OpenAIRE context, research organizations are aggregated from several datasources. This often leads to a duplication problem because an organization can be provided by multiple datasources.
Deduplication is a fundamental task to solve this problem. The deduplication in OpenAIRE follows three main stages:

  1. clustering of entities
  2. pairwise comparisons of entities in the same cluster to draw similarity relations
  3. identification of connected components to create representative entities that groups all the duplicates of each organization

Given that the pairwise comparison stage is an automatic algorithm, many false positives (or negatives) can be found.
The software available in this release provides the OpenOrgs web application: a web interface for the collection of user’s feedbacks in the context of organizations deduplication.
An user can edit organization’s metadata and approve or reject similarity relations suggested by the deduplication algorithm.
The deduplication algorithm takes advantage of user’s feedback to increase the precision and the recall of the results.
The organizations resulting from the deduplication enhanced by the user feedback are indexed and subsequently exposed by the OpenAIRE portal.

Notes

This application is distributed as part of the dnet-applications module which contains some web applications developed within the OpenAIRE-Connect and OpenAIRE-Advance projects.

Files

dnet-applications-3.1.8.zip

Files (34.1 MB)

Name Size Download all
md5:ea09ccb569a5881a2ad5522d48e9bbc2
34.1 MB Preview Download

Additional details

Funding

European Commission
OpenAIRE-Connect - OpenAIRE - CONNECTing scientific results in support of Open Science 731011
European Commission
OpenAIRE-Advance - OpenAIRE Advancing Open Scholarship 777541