Published May 26, 2024 | Version v2
Dataset Open

Variation of and associations with the depth and evenness of sequencing coverage in a sample of archived plastid genomes

  • 1. ROR icon Freie Universität Berlin
  • 2. ROR icon Fort Hays State University

Description

Depth and evenness of sequencing coverage are considered potential indicators of genome assembly quality. In plastid genomics, where new data generation has outpaced the development of suitable assembly quality indicators, these coverage metrics could offer insights into the quality of plastomes of different sizes, structures, or taxonomic origins. However, the typical variation of sequencing depth and evenness among archived plastid genomes, their variability between plastome partitions, and any association with methodological factors have yet to be evaluated. This study explores the variation of sequencing depth and evenness across a sample of publicly accessible plastid genomes and their potential associations with plastome structure, assembly accuracy, and the methodological provenance of the genome data using statistical tests. Our results indicate significant differences in sequencing depth across the four structural partitions as well as between the coding and non-coding sections of the genomes, a significant correlation between sequencing evenness and the number of ambiguous nucleotides, and a significant difference in sequencing evenness between several DNA sequencing platforms. These findings highlight that many publicly accessible plastid genomes are based on sequence data with highly variable sequencing depth and evenness and that this variation is influenced, at least partially, by genome structure and methodological factors.

Files

bam_files.zip

Files (47.1 GB)

Name Size Download all
md5:d6f9bf6a73739486d432333c82abea60
46.6 GB Preview Download
md5:95317e0a1a73abad9f8d3a4b7a9d066c
17.1 MB Preview Download
md5:2871e53eeb28c9ea27d36674575bcfcc
2.4 MB Preview Download
md5:b2223f542f362c8e8d2c38447b0fd20d
435.2 MB Preview Download

Additional details

Funding

Deutsche Forschungsgemeinschaft
Plastid phylogenomics of water lilies, with a focus on partitioning strategies and the development of bioinformatic tools 418670221
Fort Hays State University
Kansas Idea Network of Biomedical Research Excellence P20 GM103418
National Institutes of Health
Enabling data quality assessment of organelle genomes archived on GenBank through novel open-source software tools 1R01LM014506

Dates

Updated
2024-05-26
Record 11290119 updated to record 11322182
Submitted
2024-05-25
Initially uploaded as record 11290119

Software

Repository URL
https://github.com/michaelgruenstaeudl/PACVr
Programming language
R , Python
Development Status
Active