Cross-language Wikipedia link graph
Creators
Description
Wikipedia articles use Wikidata to list the links to the same article in other language versions. Therefore, each Wikipedia language edition stores the Wikidata Q-id for each article.
This dataset constitutes a Wikipedia link graph where all the article identifiers are normalized to Wikidata Q-ids. It contains the normalized links from all Wikipedia language versions. Detailed link count statistics are attached. Note that articles that have no incoming nor outgoing links are not part of this graph.
The format is as follows:
Q-id of linking page (outgoing) <tab> Q-id of linked page (incoming) <tab> language version - dump date (20241101)
This dataset was used to compute Wikidata PageRank. More information can be found on the danker repository, where the source code of the link extraction as well as the PageRank computation is hosted.
Example entries:
$ bzcat 2024-11-06.allwiki.links.bz2 | head
1 107 ckbwiki-20241101
1 107 lawiki-20241101
1 107 ltwiki-20241101
1 107 tewiki-20241101
1 107 wuuwiki-20241101
1 111 hywwiki-20241101
1 11379 bat_smgwiki-20241101
1 11471 cdowiki-20241101
1 150 ckbwiki-20241101
1 150 lowiki-20241101
Notes
Files
2024-11-06.allwiki.links.stats.txt
Files
(12.6 GB)
Name | Size | Download all |
---|---|---|
md5:933531f38a62297d38660a66166cf37c
|
12.6 GB | Download |
md5:837a53dc23c17b0587c4d60822bb39a6
|
10.3 kB | Preview Download |
Additional details
Related works
- Is compiled by
- Software: 10.5281/zenodo.7163272 (DOI)
- Is supplemented by
- Conference paper: 10.1007/978-3-319-47602-5_41 (DOI)
Software
- Repository URL
- https://github.com/athalhammer/danker
- Programming language
- Python, Shell
- Development Status
- Active