Published October 22, 2022 | Version 1.0.0
Dataset Open

LPHash - datasets

  • 1. Ca' Foscari University of Venice
  • 2. Univ. Gustave Eiffel
  • 3. University of Lille and CNRS

Description

These datasets accompany the paper "Locality-Preserving Minimal Perfect Hashing of k-mers", G. E. Pibiri, Y. Shibuya, and A. Limasset, 2023, Bioinformatics DOI: https://doi.org/10.1093/bioinformatics/btad219.

We use genomes of increasing size in terms of number of distinct k-mers; namely, the whole- genomes of:

Saccharomyces Cerevisiae (Yeast, 11.6 × 10^6 k-mers),

Caenorhabditis Elegans (Elegans, 95×10^6 k-mers),

Gadus Morhua (Cod, 0.56×10^9 k-mers),

Falco Tinnunculus (Kestrel, 1.16×10^9 k-mers), and

Homo Sapiens (Human, 2.77 × 10^9 k-mers).

For each genome, we obtain the corresponding SPSS (Spectrum-Preserving String Set) by first building the compacted de Bruijn graph using BCALM2 (Chikhi et al., 2016), then running the UST algorithm (Rahman et al., 2020). At our code repository (github.com/jermp/lphash) we provide detailed instructions on how to obtain SPSS datasets like the ones available here from fasta files.

References

  • Rayan Chikhi, Antoine Limasset, and Paul Medvedev. Compacting de Bruijn graphs from sequenc- ing data quickly and in low memory. Bioinformatics, 32(12):i201–i208, 2016.
  • Amatur Rahman and Paul Medvedev. Representation of k-mer sets using spectrum-preserving string sets. In International Conference on Research in Computational Molecular Biology, pages 152–168. Springer, 2020.

Files

Files (12.0 GB)

Name Size Download all
md5:42dc932bb96cedbae9b2aac6d63df2b6
27.8 MB Download
md5:acd38272e02487eebe004857bdacb775
27.8 MB Download
md5:1c3c5d9b501fa55fa0b15181bbdbdd49
27.7 MB Download
md5:24257ace373381430e63fe6071327046
27.7 MB Download
md5:3be91799d98b3dfc814fabda3f8364e0
27.7 MB Download
md5:3e31e43388c3e47388e308ce6888cc9f
27.7 MB Download
md5:fc72e4c40991b58a0423ed769d574102
27.7 MB Download
md5:3398c592b9a84e1bd1fb96c9e5b71a28
27.7 MB Download
md5:1e7217f6d7104ee8c43c50cc92df3a3c
27.7 MB Download
md5:8068f86d5f8d4b4bac81990ac6b1b1af
157.7 MB Download
md5:f862904d8ea4adcda117477027cbbedd
158.2 MB Download
md5:744b49db2d210f476691e18e9d8280cc
158.7 MB Download
md5:f71c58d62f92f749a37b95de201e61c4
159.1 MB Download
md5:c4a6668e76dcc83b919cb17e9c0939c5
159.6 MB Download
md5:818425230d7a2d3c59e07081bbb09e44
160.0 MB Download
md5:d7e03053304b99c143c3bcacad9d09de
160.5 MB Download
md5:8398688d2c1c56d69017fba00de44f4f
160.9 MB Download
md5:59cff3fba861e11c04895813ff0cd9d8
161.3 MB Download
md5:87dd8f5cf70255a469f3b394277c2818
803.8 MB Download
md5:1d54f2df01c5614604a8be22509db39a
808.2 MB Download
md5:f8956ce15b9a5d25f925b853820fe5c6
811.0 MB Download
md5:7f8902263df4444fc809d8c853e816a2
812.4 MB Download
md5:77568d45e6b41160e1233cf78b099430
812.6 MB Download
md5:a02390e72181f2e6b511de7aa554e5f0
812.0 MB Download
md5:8f77e7b988407500b88f58ef838563eb
810.3 MB Download
md5:3c3cbc286b812294dd44863a482ce0b3
808.6 MB Download
md5:198ff6b9887deb4bd3fc37765038042a
806.7 MB Download
md5:d4003c0a1010c9d643084790ca9b7a07
335.9 MB Download
md5:9f921bf7dc7a33e431131e48647495ef
335.4 MB Download
md5:9df5771e4dc6be52be19035098c243f3
335.1 MB Download
md5:7ca36072936334ad8da28d0ad5b036f0
334.8 MB Download
md5:c613fa4a8ced3e108292074444052db3
334.6 MB Download
md5:d012f4add2bd575298ec7e217d43fe24
334.5 MB Download
md5:251267ef46cb1ce4933533b4de68e522
334.6 MB Download
md5:2fd27cb76632d8418b98afddbcda1978
334.5 MB Download
md5:a36e092e0517fd2a1a32f1d52be15fe5
334.5 MB Download
md5:38a9e255653fdd227d603a9c8312e4e7
3.4 MB Download
md5:e191b5cf60ff20be0131420a0a8334d5
3.4 MB Download
md5:447b26e322ed6ce8b63a8ae910158931
3.4 MB Download
md5:c6add43d21011edefcaa3db29732241f
3.4 MB Download
md5:92f3fd01e1d4a2c6fc16d1a0a72321f5
3.4 MB Download
md5:d7c07f00ae5db7c711ac2dbc7763d7b8
3.4 MB Download
md5:7cbfb873fe64b7e4397bc4d604dca7b5
3.4 MB Download
md5:8ed894aa8f0a7daf9214e0c12ee6686c
3.4 MB Download
md5:9a24d09f0cd98d7ed13ad3d4c8f83332
3.4 MB Download