The GIAB genomic stratifications resource for human reference genomes
Authors/Creators
- 1. Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, Gaithersburg, MD 20899.
- 2. Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030
- 3. University of Applied Sciences Upper Austria - FH Hagenberg, Softwarepark 11, 4232 Hagenberg im Mühlkreis, Austria.
- 4. National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA, 20892.
- 5. Icahn School of Medicine at Mount Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-301, New York, NY, USA.10029.
- 6. Department of Computer Science, College of Engineering, Rice University, 6100 Main St., Houston, TX 77005-1827.
- 7. Department of Bioinformatics, Pondicherry University. India. 605014
- 8. DNA Nexus. USA.
- 9. Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.
Description
Stratification of the genome into different genomic contexts is useful when developing bioinformatics software like variant callers, to assess performance in different types of difficult regions in the human genome. Here we describe a set of genomic stratifications for the human reference genomes GRCh37, GRCh38, and T2T-CHM13v2.0. Generating stratifications for the new complete CHM13 reference genome is critical to understanding improvements in variant caller performance when using this new complete reference. The GIAB stratifications can be used when benchmarking variant calls to analyze difficult regions of the human genome in a standardized way. Here we present stratifications in the CHM13 genome in comparison to GRCh37 and GRCh38, highlighting expansions in hard-to-map and GC-rich stratifications which provide useful insight for accuracy of variants in these newly-added regions. To evaluate the reliability and utility of the new stratifications, we used the stratifications of the three references to assess accuracy of variant calls in diverse, challenging genomic regions. The means to generate these stratifications are available as a snakemake pipeline at https://github.com/ndwarshuis/giab-stratifications.