Published November 19, 2022 | Version 1
Dataset Open

Wikidata 3 Topical Subsets (Gene Wiki, Music, Ships) and 4 Random Subsets

  • 1. Heriot-Watt University

Description

This dataset contains the N-Triples files of 3 Wikidata topical subsets corresponding to 3 Wikidata WikiProject: Gene Wiki, Music, and Ships along with 4 random subsets in different sizes: two of 100K items, one 500K items, and one 1M items. Subsets are extracted from the 3 January 2022 dump. All subsets have been extracted with WDumper using these JSON specification files. The files are:

  • GeneWiki.zip: contains 25 `.nt.gz` RDF files each of which corresponds to one of the main Gene Wiki WikiProject classes, e.g. protein, gene, chemical compound, etc.
  • music.nt.gz: the RDF file corresponding to the Music WikiProject.
  • ships.nt.gz: the RDF file corresponding to the Ships WikiProject.
  • Random100K_1.zip: contains 2 `nt.gz` RDF files each of which includes (about) 50,000 random Wikidata items, 100,000 items in total.
  • Random100K_2.zip: contains 2 `nt.gz` RDF files each of which includes (about) 50,000 random Wikidata items, 100,000 items in total.
  • Random500K.zip: contains 10 `nt.gz` RDF files each of which includes (about) 50,000 random Wikidata items, 500,000 items in total.
  • Random1M.zip: contains 20 `nt.gz` RDF files each of which includes (about) 50,000 random Wikidata items, 1,000,000 items in total.

 

Files

GeneWiki.zip

Files (15.6 GB)

Name Size Download all
md5:75e5495d40254b992243038a14347dbb
11.5 GB Preview Download
md5:08a13dd93304fbdad55010712e5f8a01
1.6 GB Download
md5:74219e62a9e2fa86ca85f626ea125451
144.5 MB Preview Download
md5:a05947dcec4c5d24e3d8aa4dc0b4e5e6
144.8 MB Preview Download
md5:95e8c20f31db7e15fed540a69d7bd338
1.4 GB Preview Download
md5:97e9687740e5f3551d7883657e807ef1
694.3 MB Preview Download
md5:e4757b68aff981b8a8dc7205f4f91e25
131.9 MB Download