The Case for Data Provenance and Authenticity in Genomics
Authors/Creators
Description
Abstract
The exponential growth of publicly accessible genomic data over the last two decades has transformed life sciences, yet it has also exposed a critical vulnerability. Weakly enforced requirements for data provenance, structured metadata, and material authentication have degraded the potential of these resources for interoperability and reuse in digital biology. The lack of traceability and verification in genomic data poses escalating risks to scientific reproducibility, biosecurity, and the integrity of AI-driven biological research (AIxBio). Examples from cancer and microbial genomics, infectious disease surveillance, public sequence archives, and emerging AI-enabled biology demonstrate how poor data provenance and metadata quality gaps undermine trust, drive irreproducible results, and create opportunities for data fabrication and misuse. The manuscript further emphasizes that reproducibility alone is insufficient when shared reference data are contaminated, mislabeled, incompletely described, or biologically outdated. Furthermore, the unique role of biological repositories and international culture collections is presented as bridging the physical-to-digital divide and enabling the creation of trusted “digital twins” for biological research. Finally, the proactive preservation of physical reference materials underpinning genomic data and an emphasis on “metadata as infrastructure” is presented as a key ingredient for the future success and sustainability of artificial intelligence and machine learning across the life sciences (i.e., AIxBio). Finally, proactive preservation of physical reference materials and the treatment of “metadata as infrastructure” are presented as key ingredients for the future success and sustainability of artificial intelligence and machine learning across the life sciences.
Files
The Case for Data Provenance in Genomics 05JUN2026.pdf
Files
(348.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:696a5d0c1d927905dbd3fc95cdd5db67
|
348.3 kB | Preview Download |
Additional details
Additional titles
- Subtitle
- Building Trustworthy Foundations for Digital Biology