Published July 8, 2024 | Version v1
Dataset Open

Processed Wikipedia Dataset

Creators

Contributors

Contact person:

Description

We extract a subset of about 1,000,000 documents of Wikipedia 2020 and extract the keywords of them. The wiki_kws_dict.pkl is a map which maps each keyword to its total counts in files and query trend. The wiki_doc_0.pkl contains lists of keywords of each document. These two datasets can be loaded by the pickle package with python.

Files

Files (3.2 GB)

Name Size Download all
md5:fcc262ebde7a7ce0f1cef5c339b6d6d6
2.9 GB Download
md5:39c5852ea6cd15a533b034f3f3b8423f
333.3 MB Download

Additional details

Identifiers

Related works

Is part of
Publication: arXiv:2403.01155 (arXiv)

Dates

Submitted
2024-07

References

  • 10.48550/ARXIV.2403.01155