OpenAIRE OpenOrgs Database
Creators
- 1. ISTI-CNR
Description
In the OpenAIRE context, research organizations are aggregated from several datasources. This often leads to a duplication problem because an organization can be provided by multiple datasources.
Deduplication is a fundamental task to solve this problem. The deduplication in OpenAIRE follows three main stages:
- clustering of entities
- pairwise comparisons of entities in the same cluster to draw similarity relations
- identification of connected components to create representative entities that groups all the duplicates of each organization
Given that the pairwise comparison stage is an automatic algorithm, many false positives (or negatives) can be found.
The software available in this release provides the OpenOrgs web application: a web interface for the collection of user’s feedbacks in the context of organizations deduplication.
An user can edit organization’s metadata and approve or reject similarity relations suggested by the deduplication algorithm.
The deduplication algorithm takes advantage of user’s feedback to increase the precision and the recall of the results.
The organizations resulting from the deduplication enhanced by the user feedback are indexed and subsequently exposed by the OpenAIRE portal.
Notes
Files
dnet-applications-3.1.8.zip
Files
(34.1 MB)
Name | Size | Download all |
---|---|---|
md5:ea09ccb569a5881a2ad5522d48e9bbc2
|
34.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://code-repo.d4science.org/D-Net/dnet-applications/src/tag/dnet-applications-3.1.8/apps/dnet-orgs-database-application (URL)