Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published May 16, 2023 | Version v2023-05-16
Poster Open

Do Public Databases Need Higher Standards for Next-Generation Data Submissions?

  • 1. Department of Biochemistry and Biophysics, Texas A&M University
  • 2. Department of Biology; Texas A&M University

Description

  • Genomics, an extension of Genetics, is a powerful tool to study the function and evolution of genes and genomes. When applied to the Human genome, it can play a key role in understanding the origin of many human diseases like Cancer.
  • However, obtaining meaningful insights into any medical condition and/or pathological state requires the input of High-Quality data. Observations and/or conclusions based on incomplete and/or low quality data are not only hard to replicate and reproduce, but they are also highly questionable.
  • The vast majority of the Human Next-Generation Sequencing (NGS) datasets have been deposited in the National Center for Biotechnology Information (NCBI) - Small Read Archive (SRA) database.
  • This project started with the aim of re-analyzing a selected set of Cancer-related NCBI-SRA datasets in order to evaluate our ability to both reproduce and replicate previously published results, using a set of, in-house, newly developed algorithms.
  • To our surprise, we found that the overall quality, and specially the genome coverage of these selected datasets was not only highly variable, but especially low in coverage, and non-uncommonly, contained contaminating sequences.
  • In our view, these observations put into question the reproducibility and replicability potential of work based on these datasets.
  • We conclude that in order to guarantee the replicability and reproducibility in Science, public databases, like the NCBI-SRA, need to set higher standards for data submission.

Files

Genome_Coverage.pdf

Files (493.5 kB)

Name Size Download all
md5:671b72be2f2be8468a2e8e93ee07d2d9
493.5 kB Preview Download