Published April 18, 2025 | Version v1
Dataset Open

Sorted Unsigned Integer Datasets

Description

This page contains some well-known datasets (and some original ones) for sorted unsigned 32/64 bit integers. Most of them come from the SOSD learned index benchmark (https://github.com/learnedsystems/SOSD), but a few were generated by "flattening" adjacency lists of graphs. 

All binary datasets have a 64-bit preamble containing the dataset size. The filename always ends with uint32 or uint64, specifying the number of bits used for storing the integers. 

Specifics about the single datasets will follow:

  • books_200M_uint32: "amzn" dataset from SOSD
  • books_800M_uint64: larger slice (and 64-bit version) of the "amzn" dataset in SOSD
  • companynet_uint32: flattened adjacency list of a proprietary (companies) network. Its size is 1 million items. 
  • exponential_uint32: 50 million integer dataset following an exponential distribution (z=2, all items then multiplied by uint32_max/5).
  • fb_200M_uint64: "fb" dataset from SOSD.
  • friendster_50M_uint32: flattened adjacency list of the "friendster" network from https://snap.stanford.edu/data/com-Friendster.html
  • lognormal_uint32: 50 million integer dataset following a lognormal distribution (mu = 0, sigma = 0.5, all items then multiplied by uint32_max/5).
  • normal_800M_uint32: 800 million integer dataset following a normal distribution (mu = uint32_max/2, sigma = uint32_max/4).
  • normal_uint32: 50 million integer dataset following a normal distribution (mu = uint32_max/2, sigma = uint32_max/4).
  • osm_cellids_800M_uint64: "osm" dataset from SOSD.
  • wiki_ts_200M_uint32: "wiki" dataset from SOSD, but integers are all cast to 32 bits.
  • wiki_ts_200M_uint64: "wiki" dataset from SOSD.
  • zipf_uint32: 50 million integer dataset following a Zipf distribution (q = 0.7, max_val = uint32_max/2).
  • books_50M_uint64: 50M slice of the 64-bit "amzn" dataset from SOSD

Files

Files (22.2 GB)

Name Size Download all
md5:55845580be1554d82be1c0dda416005c
800.0 MB Download
md5:e63933fdc1c84d095aeef5ebb093d2cb
400.0 MB Download
md5:8708eb3e1757640ba18dcd3a0dbb53bc
6.4 GB Download
md5:6520621353ab6b9491b0d55e1bee9496
4.0 MB Download
md5:bc9113134f4fe0f666f97bd8046102fd
200.0 MB Download
md5:679eff3bfbc80572b30f6575b40b6918
1.6 GB Download
md5:bb54a2fd72355fca02ea9a7a20a5888e
200.0 MB Download
md5:51ceea9812d0002737ed7f393f1b01a9
200.0 MB Download
md5:6f801bf29d4265c551af4f378c9c2653
3.2 GB Download
md5:a681db7871719bebe555d20c111f439c
200.0 MB Download
md5:70670bf41196b9591e07d0128a281b9a
6.4 GB Download
md5:7f0921dd7c1fd096b9ba41c3cf7e6948
800.0 MB Download
md5:4f1402b1c476d67f77d2da4955432f7d
1.6 GB Download
md5:e44dec134041c31528bc5fdd908febe1
200.0 MB Download