Published April 26, 2019 | Version v1.2.0
Dataset Open

Simulated Arabidopsis thaliana sequencing datasets for chloroplast assembler benchmarking

  • 1. Center for Computational and Theoretical Biology, University of Würzburg, Germany
  • 2. Fraunhofer Institute for Molecular Biology and Applied Ecology IME: Gießen

Description

Changes

  • Fixed non-circular sampling from chloroplast and mitochondrion in version 1.1.0
  • Fixed off-by-one error in reverse read in version 1.0.0

Purpose and Documentation

See: github.com/chloroExtractorTeam/benchmark

Original data
The original Arabidopsis thaliana sequences were downloaded from TAIR:                                                                      

The Arabidopsis Information Resource (TAIR) on www.arabidopsis.org, Mar 22, 2019 available under the TAIR Terms of Use                                       

Tanya Z. Berardini, Leonore Reiser, Donghui Li, Yarik Mezheritsky, Robert Muller, Emily Strait and Eva Huala. "The Arabidopsis Information Resource: Making and mining the "gold standard" annotated reference plant genome."    genesis 2015 doi:10.1002/dvg.22877

Programs used to generate this data
 - seqkit (v0.10.1): Shen W, Le S, Li Y, Hu F (2016) "SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation." PLOS ONE 11(10): e0163962. doi:10.1371/journal.pone.0163962

 

Notes

Full documentation: https://github.com/chloroExtractorTeam/benchmark/blob/master/03_representative_datasets.md

Files

Files (8.4 GB)

Name Size Download all
md5:02beb6bbf352af68a67ec0bd7f32cc25
8.4 GB Download

Additional details

References

  • Berardini TZ, Reiser L, Li D, Mezheritsky Y, Muller R, Strait E, Huala E (2015) The Arabidopsis Information Resource: Making and mining the "gold standard" annotated reference plant genome. genesis doi:10.1002/dvg.22877
  • Shen W, Le S, Li Y, Hu F (2016) SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLOS ONE 11(10): e0163962. doi:10.1371/journal.pone.0163962