Published September 12, 2021 | Version v1
Dataset Open

LDkp Dataset

Description

LDkp (Long Document keyphrase) dataset is the first benchmark corpus of 1.3M documents for identifying keyphrases from long documents.The LDkp dataset is released in two versions :

  • LDkp3k consists of 0.1M keyphrase tagged long documents, is created using keyphrases from KP20k (Meng et al., 2017) and their corresponding long document text from S2ORC (Lo et al., 2020).
  • LDkp10k consists of 1.3M long documents along with target keyphrases is created using keyphrases from OAGKX (Çano, 2019) and their corresponding long document text from S2ORC (Lo et al., 2020).

Files

LDkp10k_base_large_training.zip

Files (45.0 GB)

Name Size Download all
md5:232dedecc0f76d285cff4e551865b577
11.5 GB Preview Download
md5:09f4037e648f7dc82104936ed92b412d
1.5 GB Preview Download
md5:f75aae078dacacd1cee6ae42414acad1
598.3 MB Preview Download
md5:4704bf7e9a022a058753949ab3074861
301.6 MB Preview Download
md5:eebed7ef118bc1389a0a8f779dcc5a42
298.6 MB Preview Download
md5:dde1fd1e28a89924bc3d1375ac170c9c
13.2 GB Preview Download
md5:4debc4f3428382e2270961cc23967f3b
3.1 GB Preview Download
md5:66b86d69875bb1b25ec90cfa5acedda3
1.2 GB Preview Download
md5:fc1b58e7a0242890c852b586ba9b9c58
621.3 MB Preview Download
md5:62a135778fcb43d69c15b23f219e66e0
613.2 MB Preview Download
md5:c74f3409ddd3e71974137669a8016638
1.1 GB Preview Download
md5:b1e3e416e4bd6db785fcd65b52f82a9b
2.1 GB Preview Download
md5:867fc2615d279aac5824588ad071a5da
833.8 MB Preview Download
md5:8aa347bd4c1f0a40d40f3e9ec5e0d71f
140.6 MB Preview Download
md5:cb248f8fc14d312384f2eb0be38060ab
137.9 MB Preview Download
md5:cffc632c8ac87c7db00c43b1e46e92e0
1.2 GB Preview Download
md5:48c40b38673c5cb5bd5dade072dabd02
4.3 GB Preview Download
md5:2b4a9a44f8f148c3f0271084fda90547
1.7 GB Preview Download
md5:97addf736b251fe71a5ce027084e1f0b
291.8 MB Preview Download
md5:673cd7ad55cba5479dc5b8fd7e8e1e4b
287.5 MB Preview Download