Published September 2024 | Version NeurIPS 2024
Dataset Open

HiT: Language Models as Hierarchy Encoders

  • 1. ROR icon University of Oxford
  • 2. ROR icon University of Cambridge
  • 3. ROR icon University of Manchester

Description

About

Datasets for training and evaluating the Hierarchy Transformer encoders (HiTs) proposed in the paper titled: "Language Models as Hierarchy Encoders".

  • Files with multi suffix corresponds to Multi-hop Inference evaluaiton.
  • Files with mixed suffix corresponds to Mixed-hop Prediction (and its transfer setting) evaluation.
  • schemaorg, foodon, and doid are only involved in the transfer evaluation, but the datasets here for foodon and doid also give their training sets (see explanation in the paper for why we opted not to generate a trainning set for schemaorg).

The previous version of this dataset collection has been marked deprecated because it seems that it contains broken files for snomed.

Huggingface Datasets

We offer a convenient Huggingface Datasets entry, enabling users to load data directly using the load_dataset method. The datasets are available in formats of either entity triplets or labelled entity pairs. Please note that in this way, the original entity IDs are not retained. To map entities back to their original hierarchies, refer to this Zenodo release.

Citation

@article{he2024language,
  title={Language models as hierarchy encoders},
  author={He, Yuan and Yuan, Moy and Chen, Jiaoyan and Horrocks, Ian},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={14690--14711},
  year={2024}
}

Links

Contact

Yuan He (yuan.he(at)cs.ox.ac.uk)

Files

doid-mixed.zip

Files (236.0 MB)

Name Size Download all
md5:f03bd0763063a1f0bbe2a51d1efcdaba
1.5 MB Preview Download
md5:1e2a5d5a36c7c03e56f00f79999c8dd0
8.8 MB Preview Download
md5:55fa7f1b506ba6d71355d0eb45e50d0c
29.8 MB Preview Download
md5:1ca33af8d7e847b562f14af72e5d37cd
251.4 kB Preview Download
md5:6ac08c7266428122631a6fceb24e5906
80.8 MB Preview Download
md5:3ca80758a71566ed9c04ec0e392515fc
73.5 MB Preview Download
md5:4b2c790a6974631ef35ee940ba881af5
20.7 MB Preview Download
md5:39c18ae770a4d84c650badc5f648f6d8
20.7 MB Preview Download