Published January 14, 2022
| Version 2.0.0
Dataset
Open
EvaNIL: silver standard dataset for large-scale NIL entity linking evaluation
Authors/Creators
- 1. LaSiGE, Faculdade de Ciências, Universidade de Lisboa
Description
The EvaNIL dataset can be used to train or evaluate approaches developed for NIL entity linking. It was built from several Biomedical and Life Sciences corpora:
- PubMed DS
- CRAFT corpus
- MedMentions
These corpora contain entities associated with knowledge base concepts. To build the EvaNIL dataset, we assumed that those knowledge base concepts did not exist in the respective knowledge bases, so each entity is associated instead with the direct ancestors of those original concepts.
The EvaNIL dataset is divided into 6 partitions including annotations from several knowledge bases:
- "medic" (CTD-MEDIC)
- "ctd_anatomy" (CTD-Anatomy)
- "ctd_chemicals" (CTD-Chemicals)
- "chebi" (ChEBI)
- "go_bp" (GO-Biological Process)
- "hp" (HPO)
Notes
Files
Files
(1.4 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:22155b3ff2b99d50b535b83470113b88
|
1.4 GB | Download |