K-mer collision statistics (BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis)

Firtina, Can

doi:10.5281/zenodo.7319786

Published November 14, 2022 | Version v1

Dataset Open

K-mer collision statistics (BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis)

Firtina, Can¹

1. ETH Zurich

This dataset contains 1,077 FASTA files and CSV files. Each FASTA file includes 25-character long sequences similar to each other.

We have a CSV file for each tool (i.e., minimap2 and BLEND) and configuration (i.e., different number of neighbors in BLEND). CSV files include the non-identical k-mer pairs (16-mers) that generate the same hash value (i.e., collisions). These k-mers are extracted from sequences that are similar to each other. In each line, we show the hash value of the k-mers, the actual sequene pairs that the k-mers are extracted from, k-mer pairs that generate the same hash value, and the edit distance between these k-mers.

Files

Files (94.0 kB)

Name	Size	Download all
kmer_collisions.tar.gz md5:88110996af2a9466eda92b08c44ed88f	94.0 kB	Download

127

Views

Downloads

Show more details

	All versions	This version
Views	127	127
Downloads	12	12
Data volume	1.1 MB	1.1 MB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 14, 2022
Modified: November 15, 2022

K-mer collision statistics (BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis)

Authors/Creators

Description

Files

Files (94.0 kB)