Cross-language Wikipedia link graph
Creators
Description
Wikipedia articles use Wikidata to list the links to the same article in other language versions. Therefore, each Wikipedia language edition stores the Wikidata Q-id for each article.
This dataset constitutes a Wikipedia link graph where all the article identifiers are normalized to Wikidata Q-ids. It contains the normalized links from all Wikipedia language versions. Detailed link count statistics are attached. Note that articles that have no incoming nor outgoing links are not part of this graph.
The format is as follows:
Q-id of linking page (outgoing) <tab> Q-id of linked page (incoming) <tab> language version - dump date (20221101)
This dataset was used to compute Wikidata PageRank. More information can be found on the danker repository, where the source code of the link extraction as well as the PageRank computation is hosted.
Example entries:
bzcat 2022-11-10.allwiki.links.bz2 | head
1 1001051 zhwiki-20221101
1 1001 azbwiki-20221101
1 10022 nds_nlwiki-20221101
1 1005917 ptwiki-20221101
1 10090 guwiki-20221101
1 10090 tawiki-20221101
1 101038 glwiki-20221101
1 101072 idwiki-20221101
1 101072 lvwiki-20221101
1 101072 ndswiki-20221101
Notes
Files
2022-11-10.allwiki.links.stats.txt
Files
(11.3 GB)
Name | Size | Download all |
---|---|---|
md5:bba0a7f9ab4ed172c8eb89a9632d1f6f
|
11.3 GB | Download |
md5:27d073ee6cc994c0ceed35ccabae380c
|
9.7 kB | Preview Download |
Additional details
Related works
- Is compiled by
- Software: 10.5281/zenodo.7163272 (DOI)
- Is supplemented by
- Conference paper: 10.1007/978-3-319-47602-5_41 (DOI)