Published November 14, 2022 | Version v1
Dataset Open

K-mer collision statistics (BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis)

Authors/Creators

  • 1. ETH Zurich

Description

This dataset contains 1,077 FASTA files and CSV files. Each FASTA file includes 25-character long sequences similar to each other.

We have a CSV file for each tool (i.e., minimap2 and BLEND) and configuration (i.e., different number of neighbors in BLEND). CSV files include the non-identical k-mer pairs (16-mers) that generate the same hash value (i.e., collisions). These k-mers are extracted from sequences that are similar to each other. In each line, we show the hash value of the k-mers, the actual sequene pairs that the k-mers are extracted from, k-mer pairs that generate the same hash value, and the edit distance between these k-mers.

 

Files

Files (94.0 kB)

Name Size Download all
md5:88110996af2a9466eda92b08c44ed88f
94.0 kB Download