Published November 19, 2022
| Version 1
Dataset
Open
Wikidata 3 Topical Subsets (Gene Wiki, Music, Ships) and 4 Random Subsets
Description
This dataset contains the N-Triples files of 3 Wikidata topical subsets corresponding to 3 Wikidata WikiProject: Gene Wiki, Music, and Ships along with 4 random subsets in different sizes: two of 100K items, one 500K items, and one 1M items. Subsets are extracted from the 3 January 2022 dump. All subsets have been extracted with WDumper using these JSON specification files. The files are:
- GeneWiki.zip: contains 25 `.nt.gz` RDF files each of which corresponds to one of the main Gene Wiki WikiProject classes, e.g. protein, gene, chemical compound, etc.
- music.nt.gz: the RDF file corresponding to the Music WikiProject.
- ships.nt.gz: the RDF file corresponding to the Ships WikiProject.
- Random100K_1.zip: contains 2 `nt.gz` RDF files each of which includes (about) 50,000 random Wikidata items, 100,000 items in total.
- Random100K_2.zip: contains 2 `nt.gz` RDF files each of which includes (about) 50,000 random Wikidata items, 100,000 items in total.
- Random500K.zip: contains 10 `nt.gz` RDF files each of which includes (about) 50,000 random Wikidata items, 500,000 items in total.
- Random1M.zip: contains 20 `nt.gz` RDF files each of which includes (about) 50,000 random Wikidata items, 1,000,000 items in total.
Files
GeneWiki.zip
Files
(15.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:75e5495d40254b992243038a14347dbb
|
11.5 GB | Preview Download |
|
md5:08a13dd93304fbdad55010712e5f8a01
|
1.6 GB | Download |
|
md5:74219e62a9e2fa86ca85f626ea125451
|
144.5 MB | Preview Download |
|
md5:a05947dcec4c5d24e3d8aa4dc0b4e5e6
|
144.8 MB | Preview Download |
|
md5:95e8c20f31db7e15fed540a69d7bd338
|
1.4 GB | Preview Download |
|
md5:97e9687740e5f3551d7883657e807ef1
|
694.3 MB | Preview Download |
|
md5:e4757b68aff981b8a8dc7205f4f91e25
|
131.9 MB | Download |