Published January 11, 2018 | Version 2.0
Dataset Open

Supplementary files for the manuscript "Assembly of Long Error-Prone Reads Using Repeat Graphs"

  • 1. UCSD
  • 2. ANU

Description

Supplementary files for the manuscript "Assembly of Long Error-Prone Reads Using Repeat Graphs"

 

Contents
---------

* `human_assemblues` - Flye assemblies of the human ONT sequencing data + QUAST benchmarking
of Flye, Canu and MaSuRCA assemblies. Scripts for assembly graph analysis are also included.

* `nctc_assemblis` - Flye assemblies of the NCTC 21 bacterial dataset.

* `yeast_assemblies` - working directories Flye, Canu, Falcon, Hinge and Miniasm assemblies of 
yeast PB and ONT datasets + final assemblies + quast report. Some large files 
(such as read alignments) were deleted.

* `worm_assemblies` - working directories Flye, Canu, Falcon, Hinge and Miniasm assemblies of 
the c. elegans dataset + final assemblies + quast report. Some large files 
(such as read alignments) were deleted. `tandem_misassemblies` directory contain
the detailed analysis of nine tandem misassemblies. We recommend "gepard" dot-plotter for visualization.

* `metagenome_assemblies` - Flye and Canu assemblies of a PacBio mock metagenome dataset.
In addition to metagenome assemblies, each bacteria was reassembled separately to
estimate the rate of divergence between the target genomes and the available references.

* `simulated_data` - two assemblies of the simulated data illustrating Figure 1 (from Appendix I),
as well as simulated unbridged repeats benchmark.


Software versions and parameters
--------------------------------

* Flye - 2.3.5 (commit 20afeda)
* Canu - 1.7.1 (commit dfa60b8)
* Falcon - 0.3.0 (FALCON-Integrate commit 7498ef9)
* HINGE - 0.5.0 (commit 79fdf66)
* Miniasm -  0.2-r168-dirty (commit 40ec280) / Minimap2 2.8-r711-dirty (commit 8fc5f8d)
* Quast - 5.0.0 (commit de6973bb)

Flye and Canu were run with the default parameters. The config files / scripts for
Falcon, HINGE and Miniasm could be found in the 'asm_config' archive folder.

The HUMAN (but not the HUMAN+) assembly was generated with the earlier 
Flye version 2.3.2 (released on Feb 20 2018) to provide a fair comparison 
with the Canu and MaSuRCA assemblies (which were not updated since the release of Flye 2.3.2).
We note that the HUMAN assembly using the latest Flye version 2.3.5 has 
NGA50 = 7.3 Mb and improves over the Flye 2.3.2 assembly (NGA50 = 6.3Mb). 
HUMAN+ was assembled using the latest Flye and Canu versions (as of September 2018).

The code for unbridged repeat resolution is currently available 
in a separate 'flye-trestle' branch (commit 6100d32)

Files

README.md

Files (35.1 GB)

Name Size Download all
md5:9351cf45757945532a918d15e70e800f
1.8 kB Download
md5:d21910ee8bbf4fa7b6067ff3adbc37e7
6.6 GB Download
md5:acc6f7098e73943466f09791e9f4b43d
8.9 GB Download
md5:fedb148b839eaebb5de3a141e8dd59ca
836.2 MB Download
md5:a8359d2acc5124dd62ea16456a07cd4c
850.9 MB Download
md5:890d12b801dab4732e05dd363dd76da3
1.2 GB Download
md5:8201e91a84f0ed0c4ed2f602f1f4e982
2.6 kB Preview Download
md5:8e6ba2b433974f5d1e20e025e66ae012
6.7 GB Download
md5:dc39f2a3a3cd918f74960b9ed0e8bcb2
6.3 GB Download
md5:209534f5f6b27cdab3e01644546bee89
3.8 GB Download