Published April 17, 2024 | Version 1.0.0
Dataset Open

A Benchmark dataset on Semantic Change in Scholarly Publications on Disability

Description

This is a benchmark dataset for semantic shift detection in disability-related corpora, including collected title and abstract text from PubMed and ArXiv, annotation sets based on domain experts and LLMs, and extracted KGs (Wikidata entity claims). The corpus from PubMed covers the period from the 1900s to 2023, while the corpus from ArXiv covers the period from the 1990s to 2023. The corpus was filtered based on 16 disability-related target words. In the annotation sets, '1' indicates that a semantic shift occurred for a target word, while '0' indicates the opposite. In particular, the LLM-based annotation sets include their generated text, and we used the Llama2 and GPT-4 models. '7b' refers to the parameter size of the Llama2 model. Graph_data.zip contains Wikidata entity claims.

Files

PubMed.csv

Files (892.6 MB)

Name Size Download all
md5:04f5587413e13bb71a87288d5d49d64c
9.8 MB Preview Download
md5:cecf258eede9c8afa7696f097355bb2a
9.2 MB Preview Download
md5:15471e19d9b62b134b64d9f6d5f31094
16.1 MB Preview Download
md5:c464834c305896e5cda9fca54a21a631
10.6 MB Preview Download
md5:374a1b0dd6138e84e82bf1211a75f7bb
333.4 kB Preview Download
md5:a8b4f3dd10a5d869316ea83447505870
333.3 kB Preview Download
md5:47080c115893f44d54fda8b28fd14b05
333.4 kB Preview Download
md5:c19ace7e5c2d5a89717f20dca8c5c1a0
11.6 MB Preview Download
md5:3be9394580f8863459339d6525361d03
4.9 MB Preview Download
md5:a24ff17fa8a217fe19ca481bcf74fff1
32.0 MB Preview Download
md5:71f6a9771a28df611be6a8f6eb72d1fd
797.4 MB Preview Download

Additional details

Funding

European Commission
MuseIT - Multi-sensory, User-centred, Shared cultural Experiences through Interactive Technologies 101061441

Dates

Created
2024-04-15