Supplementary files for the manuscript "Assembly of Long Error-Prone Reads Using Repeat Graphs"
Description
Supplementary files for the manuscript "Assembly of Long Error-Prone Reads Using Repeat Graphs"
Contents
---------
* `human_assemblues` - Flye assemblies of the human ONT sequencing data + QUAST benchmarking
of Flye, Canu and MaSuRCA assemblies. Scripts for assembly graph analysis are also included.
* `nctc_assemblis` - Flye assemblies of the NCTC 21 bacterial dataset.
* `yeast_assemblies` - working directories Flye, Canu, Falcon, Hinge and Miniasm assemblies of
yeast PB and ONT datasets + final assemblies + quast report. Some large files
(such as read alignments) were deleted.
* `worm_assemblies` - working directories Flye, Canu, Falcon, Hinge and Miniasm assemblies of
the c. elegans dataset + final assemblies + quast report. Some large files
(such as read alignments) were deleted. `tandem_misassemblies` directory contain
the detailed analysis of nine tandem misassemblies. We recommend "gepard" dot-plotter for visualization.
* `metagenome_assemblies` - Flye and Canu assemblies of a PacBio mock metagenome dataset.
In addition to metagenome assemblies, each bacteria was reassembled separately to
estimate the rate of divergence between the target genomes and the available references.
* `simulated_data` - two assemblies of the simulated data illustrating Figure 1 (from Appendix I),
as well as simulated unbridged repeats benchmark.
Software versions and parameters
--------------------------------
* Flye - 2.3.5 (commit 20afeda)
* Canu - 1.7.1 (commit dfa60b8)
* Falcon - 0.3.0 (FALCON-Integrate commit 7498ef9)
* HINGE - 0.5.0 (commit 79fdf66)
* Miniasm - 0.2-r168-dirty (commit 40ec280) / Minimap2 2.8-r711-dirty (commit 8fc5f8d)
* Quast - 5.0.0 (commit de6973bb)
Flye and Canu were run with the default parameters. The config files / scripts for
Falcon, HINGE and Miniasm could be found in the 'asm_config' archive folder.
The HUMAN (but not the HUMAN+) assembly was generated with the earlier
Flye version 2.3.2 (released on Feb 20 2018) to provide a fair comparison
with the Canu and MaSuRCA assemblies (which were not updated since the release of Flye 2.3.2).
We note that the HUMAN assembly using the latest Flye version 2.3.5 has
NGA50 = 7.3 Mb and improves over the Flye 2.3.2 assembly (NGA50 = 6.3Mb).
HUMAN+ was assembled using the latest Flye and Canu versions (as of September 2018).
The code for unbridged repeat resolution is currently available
in a separate 'flye-trestle' branch (commit 6100d32)
Files
README.md
Files
(35.1 GB)
Name | Size | Download all |
---|---|---|
md5:9351cf45757945532a918d15e70e800f
|
1.8 kB | Download |
md5:d21910ee8bbf4fa7b6067ff3adbc37e7
|
6.6 GB | Download |
md5:acc6f7098e73943466f09791e9f4b43d
|
8.9 GB | Download |
md5:fedb148b839eaebb5de3a141e8dd59ca
|
836.2 MB | Download |
md5:a8359d2acc5124dd62ea16456a07cd4c
|
850.9 MB | Download |
md5:890d12b801dab4732e05dd363dd76da3
|
1.2 GB | Download |
md5:8201e91a84f0ed0c4ed2f602f1f4e982
|
2.6 kB | Preview Download |
md5:8e6ba2b433974f5d1e20e025e66ae012
|
6.7 GB | Download |
md5:dc39f2a3a3cd918f74960b9ed0e8bcb2
|
6.3 GB | Download |
md5:209534f5f6b27cdab3e01644546bee89
|
3.8 GB | Download |