Published July 8, 2024
| Version v1
Dataset
Open
Processed Wikipedia Dataset
Description
We extract a subset of about 1,000,000 documents of Wikipedia 2020 and extract the keywords of them. The wiki_kws_dict.pkl is a map which maps each keyword to its total counts in files and query trend. The wiki_doc_0.pkl contains lists of keywords of each document. These two datasets can be loaded by the pickle package with python.
Files
Files
(3.2 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:fcc262ebde7a7ce0f1cef5c339b6d6d6
|
2.9 GB | Download |
|
md5:39c5852ea6cd15a533b034f3f3b8423f
|
333.3 MB | Download |
Additional details
Identifiers
- arXiv
- arXiv:2403.01155
Related works
- Is part of
- Publication: arXiv:2403.01155 (arXiv)
Dates
- Submitted
-
2024-07
Software
References
- 10.48550/ARXIV.2403.01155