Published November 14, 2022
| Version v1
Dataset
Open
K-mer collision statistics (BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis)
Description
This dataset contains 1,077 FASTA files and CSV files. Each FASTA file includes 25-character long sequences similar to each other.
We have a CSV file for each tool (i.e., minimap2 and BLEND) and configuration (i.e., different number of neighbors in BLEND). CSV files include the non-identical k-mer pairs (16-mers) that generate the same hash value (i.e., collisions). These k-mers are extracted from sequences that are similar to each other. In each line, we show the hash value of the k-mers, the actual sequene pairs that the k-mers are extracted from, k-mer pairs that generate the same hash value, and the edit distance between these k-mers.
Files
Files
(94.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:88110996af2a9466eda92b08c44ed88f
|
94.0 kB | Download |