Published April 18, 2025
| Version v1
Dataset
Open
Sorted Unsigned Integer Datasets
Creators
Description
This page contains some well-known datasets (and some original ones) for sorted unsigned 32/64 bit integers. Most of them come from the SOSD learned index benchmark (https://github.com/learnedsystems/SOSD), but a few were generated by "flattening" adjacency lists of graphs.
All binary datasets have a 64-bit preamble containing the dataset size. The filename always ends with uint32 or uint64, specifying the number of bits used for storing the integers.
Specifics about the single datasets will follow:
- books_200M_uint32: "amzn" dataset from SOSD
- books_800M_uint64: larger slice (and 64-bit version) of the "amzn" dataset in SOSD
- companynet_uint32: flattened adjacency list of a proprietary (companies) network. Its size is 1 million items.
- exponential_uint32: 50 million integer dataset following an exponential distribution (z=2, all items then multiplied by uint32_max/5).
- fb_200M_uint64: "fb" dataset from SOSD.
- friendster_50M_uint32: flattened adjacency list of the "friendster" network from https://snap.stanford.edu/data/com-Friendster.html.
- lognormal_uint32: 50 million integer dataset following a lognormal distribution (mu = 0, sigma = 0.5, all items then multiplied by uint32_max/5).
- normal_800M_uint32: 800 million integer dataset following a normal distribution (mu = uint32_max/2, sigma = uint32_max/4).
- normal_uint32: 50 million integer dataset following a normal distribution (mu = uint32_max/2, sigma = uint32_max/4).
- osm_cellids_800M_uint64: "osm" dataset from SOSD.
- wiki_ts_200M_uint32: "wiki" dataset from SOSD, but integers are all cast to 32 bits.
- wiki_ts_200M_uint64: "wiki" dataset from SOSD.
- zipf_uint32: 50 million integer dataset following a Zipf distribution (q = 0.7, max_val = uint32_max/2).
- books_50M_uint64: 50M slice of the 64-bit "amzn" dataset from SOSD
Files
Files
(22.2 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:55845580be1554d82be1c0dda416005c
|
800.0 MB | Download |
|
md5:e63933fdc1c84d095aeef5ebb093d2cb
|
400.0 MB | Download |
|
md5:8708eb3e1757640ba18dcd3a0dbb53bc
|
6.4 GB | Download |
|
md5:6520621353ab6b9491b0d55e1bee9496
|
4.0 MB | Download |
|
md5:bc9113134f4fe0f666f97bd8046102fd
|
200.0 MB | Download |
|
md5:679eff3bfbc80572b30f6575b40b6918
|
1.6 GB | Download |
|
md5:bb54a2fd72355fca02ea9a7a20a5888e
|
200.0 MB | Download |
|
md5:51ceea9812d0002737ed7f393f1b01a9
|
200.0 MB | Download |
|
md5:6f801bf29d4265c551af4f378c9c2653
|
3.2 GB | Download |
|
md5:a681db7871719bebe555d20c111f439c
|
200.0 MB | Download |
|
md5:70670bf41196b9591e07d0128a281b9a
|
6.4 GB | Download |
|
md5:7f0921dd7c1fd096b9ba41c3cf7e6948
|
800.0 MB | Download |
|
md5:4f1402b1c476d67f77d2da4955432f7d
|
1.6 GB | Download |
|
md5:e44dec134041c31528bc5fdd908febe1
|
200.0 MB | Download |