Published February 10, 2024 | Version v1
Dataset Open

Examples of sequence alignment with contiguous, binary and ternary seeds

  • 1. University of Manchester
  • 2. University of Leeds

Description

Classical sequence alignment algorithms use contiguous chunks of symbols to pre-align short sequences (reads) obtained for a studied organism to a long reference sequence. The use of spaced seeds (when we ignore possible differences between two sequences at some positions) allows researchers to improve the sensitivity of alignment algorithms.

In genetics, point mutations have different probabilities. Therefore, it may be reasonable to consider transitional (A <-> G, C <-> T) and transversional (all other) mutations separately.

In perlotSeeds, we consider the alignment of paired-end reads (Han Chinese South, sequence data, ERR016118) with respect to the Human Reference Genome (Human genome assembly GRCh38.p14).

We consider various contiguous seeds, e.g. C32 for the length of 32, and ternary seeds for the given read’s length (76), e.g. T1V2 is a seed to allow one transitional and two transversional mismatches. Then, generate a library of records corresponding to a chosen seed. This library is used to find candidate alignments of all reads.

We provide statistics (InputStat.zip) related to each library generated, i.e. the number of records having the same signature (generated by the seed). There are also output statistics for all reads and chosen seeds, e.g. outputStatT1V3.zip, when we know how many signatures are generated for each read, how many successful alignments can be done and the best score. More detailed information related to several groups of reads can be found in the ExampleOutput.zip file.

 

Files

_OutputStatREADME.txt

Files (4.2 GB)

Name Size Download all
md5:07bb551ba8efd5be96ce0337e893ef26
990 Bytes Preview Download
md5:02c72c9085451ded837d245e50c97c8d
1.2 GB Preview Download
md5:9250c0676ffd70d8a4ff9ed28f365505
445.4 kB Preview Download
md5:2c569506dfa6714f5a69c46a49a97586
143.3 MB Preview Download
md5:86dd38b8cae79c43f687894846e9adce
133.8 MB Preview Download
md5:94f35a947c47dfaf36d004d9b0d90c7b
125.7 MB Preview Download
md5:9b8ae5f09239cedb7ca7ee6734b97888
119.1 MB Preview Download
md5:ad3028685c297214ada7e40f982d1283
113.5 MB Preview Download
md5:19484172b6ebd2671cbf632c72b46299
108.2 MB Preview Download
md5:b57540c8b5ab2c697b91013ec33fafca
104.0 MB Preview Download
md5:190cdac35edf1b94db6a52c62b8aa327
100.1 MB Preview Download
md5:acf147af630b9bfa1353ceab5dcb6ebb
97.0 MB Preview Download
md5:802ab52f468c364e6950bac30ce812b1
94.2 MB Preview Download
md5:225688f2ef4bb0012d8b90e4fa9765c1
101.4 MB Preview Download
md5:d1a7e748057b7bbb500807696a2c62c1
114.5 MB Preview Download
md5:4e120107fcb693348e36e25b6a9202f0
125.9 MB Preview Download
md5:3ba198ab11d1b6698451eec720025986
96.2 MB Preview Download
md5:e1f8d88dd2b2d590bb2ea004341a41aa
107.7 MB Preview Download
md5:18756f778e74ba39d789fb49e9d0a245
121.6 MB Preview Download
md5:dd8e7102227322375eff210f54beb2f4
134.7 MB Preview Download
md5:dba005000f30d6933fab7be7e1731070
103.4 MB Preview Download
md5:e7fe7fbe79f9d26f7ffdd59fb75bc868
113.7 MB Preview Download
md5:98488f30927bb77ff26744fb8dbe67e9
129.2 MB Preview Download
md5:084fd3694d563b1af255a9a71bebe4c2
107.5 MB Preview Download
md5:d2ebe63895ac8cdf5cb03d74faf98bec
115.0 MB Preview Download
md5:1edcd2e8b8bb05d23a52020aed3103c8
131.9 MB Preview Download
md5:7f23b891235d07f69c536b4401c77a76
125.3 MB Preview Download
md5:ffe6f9123adcf9a834e0810933d4f35c
117.8 MB Preview Download
md5:e04a40eaf0100380caf6c84a5f4faf99
129.0 MB Preview Download