Published January 14, 2022 | Version 2.0.0
Dataset Open

EvaNIL: silver standard dataset for large-scale NIL entity linking evaluation

  • 1. LaSiGE, Faculdade de Ciências, Universidade de Lisboa

Description

The EvaNIL dataset can be used to train or evaluate approaches developed for NIL entity linking. It was built from several Biomedical and Life Sciences corpora:

  • PubMed DS
  • CRAFT corpus
  • MedMentions

These corpora contain entities associated with knowledge base concepts. To build the EvaNIL dataset, we assumed that those knowledge base concepts did not exist in the respective knowledge bases, so each entity is associated instead with the direct ancestors of those original concepts.

The EvaNIL dataset is divided into 6 partitions including annotations from several knowledge bases:

  • "medic" (CTD-MEDIC)
  • "ctd_anatomy" (CTD-Anatomy)
  • "ctd_chemicals" (CTD-Chemicals)
  • "chebi" (ChEBI)
  • "go_bp" (GO-Biological Process)
  • "hp" (HPO)

 

Notes

Funding by Fundação para a Ciência e a Tecnologia (FCT) through the following grants: 2020.05393.BD, PTDC/CCI-BIO/28685/2017, UIDB/00408/2020, UIDP/00408/2020

Files

Files (1.4 GB)

Name Size Download all
md5:22155b3ff2b99d50b535b83470113b88
1.4 GB Download