Published September 10, 2025
| Version 1.0
Dataset
Open
OAMED-XMLC: a Two-Taxonomy Dataset for Benchmarking Extreme Multi-Label Classification on Medical Documents
Authors/Creators
Description
The OAMEDXMLC dataset comprises 869'402 scientific documents, publications that are related to Surgery. It includes labeled, annotated data such as various surgery categories, domains related to the documents, authors, year of publication and references to other documents. With the help of those annotations, example tasks that can be trained using this dataset include:
- Document tagging or classification among a large amount of categories (extreme multi-label classification, or XMLC)
- Authors prediction
- Year of publication prediction
- Reference/link prediction
Note that this is an extension of the OAXMLC dataset https://zenodo.org/records/15309916
Importantly, this dataset is equipped with two independent taxonomies and set of labels, opening multiple possibilities, including
- Principled investigation of the influence of taxonomies on XML algorithms
- Transfer learning in XMLC (from one taxonomy to the other)
Each taxonomy is provided both in a turtle/SKOS format, as well as in a json/txt format for easier XMLC usage.
The dataset was built with data coming from the OpenAlex[OpenAlex](https://openalex.org/) open catalog.
More detail can be found in the README.md file as well as in the original dataset https://zenodo.org/records/15309916
Files
concepts.zip
Files
(2.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:039a9c06156f2d961e8187f63c7ba3ad
|
209.8 kB | Preview Download |
|
md5:b9ff13279daf5b2d715d72b137f85b04
|
2.0 GB | Preview Download |
|
md5:2f3c016b3cacc54c90cf70eff1ce462d
|
11.0 kB | Preview Download |
|
md5:7191c1c35893686dc741948fe4bdb2c6
|
1.5 MB | Download |
|
md5:15ca5788ce37c692e6f298a42aa0294c
|
151.2 kB | Download |
|
md5:4d06a0d31f18f8688fca64339508cc78
|
15.9 kB | Preview Download |
|
md5:c89594f11b245d3f0f3da47052886f47
|
28.4 kB | Preview Download |