Published July 27, 2023 | Version 1.0
Dataset Open

WikiMed-DE

  • 1. University of Stuttgart

Description

WikiMed-DE is a silver-standard, automatically annotated biomedical entity linking dataset for the German language. WikiMed-DE encompasses a substantial collection of 53,981 articles from the German Wikipedia annotated with 1,951,081 mentions corresponding to 317,010 unique mention URLs. The hyperlinks of Wikipedia articles are used to connect concept mentions to Wikidata and transitively to three biomedical concept IDs: the Concept Unique Identifier from the Unified Medical Language System, the MeSH ID from Medical Subject Headings hierarchy, and the DOID from the Disease Ontology. A curated subset, WikiMed-DE-BEL, is released as a ready-to-use benchmark for biomedical entity linking in German. It features the same number of articles as WikiMed-DE, but only the highest-quality information is retained: 413,913 mentions corresponding to 35,012 unique concepts.

Files

BEL-silver-standard.zip

Files (669.7 MB)

Name Size Download all
md5:0050407b7ce94d7ce7eac4c17aa999d8
669.7 MB Preview Download