ELIXIR-EXCELERATE D8.6: Documentation on adequate quality reporting standards for genomics datasets
Creators
- 1. CNAG
Description
Rare disease (RD) research faces particular challenges because patient populations, clinical expertise, and research communities are small in number and highly fragmented both geographically and in terms of medical specialty. The scarcity of rare disease patients and their corresponding (gen)omic data has made data sharing one of the fundamental pillars to fasten and improve patient diagnostic and to reach IRDiRC 2017-2027 vision to enable all people living with a RD to receive an accurate diagnosis, care, and available therapy within one year of coming to medical attention.
Different project such as NeurOmics, EurenOmics, RD-Connect and more recently Solve-RD and EJP-RD, infrastructures such as BBMRI and ELIXIR and initiatives such as GA4GH have been working towards this objective. Indeed, rare disease platforms such as RD-Connect GPAP enable controlled data sharing of standardised phenotypic and genomic data. HPO, OMIM and Orphanet (ORDO) ontologies are used to collect phenotypic data and GATK best practices and GA4GH standards are followed to collect and process genomic data through a standardised pipeline (Laurie et al., 2016).
Collating genomic data from disparate centers across different countries has largely evidentiate to improve our understanding on rare diseases (Lochmüller et al., 2018). However in order to fully benefit from this unprecedented access to genomic data, care must be given to determine the quality of these genomic datasets, especially when sequenced at different centres, under different protocols and using different technologies. In this sense, several metrics such as depth of coverage, base quality and mapping quality are already broadly used for NGS quality evaluation. However, due to the rapid development of the genome sequencing field, comprehensive quality management considerations are still scarce and although some efforts have been made, there is no current standards for genomic data quality comparison (Endrullat et al 2016 and Mahamdallie et al 2018 ).
In this context, one of the specific objectives of EXCELERATE WP8 was to establish a framework for quality assessment of genomic data (Task 8.1.2) to enable rare disease researchers to easily compare genomic datasets, starting with whole exome sequencing data. In this deliverable, we have explored a rating system based on 5 different quality metrics. This rating system could be used as a starting point for continuing work in the context of WGS sequencing data and the 1M genomes declaration as federated systems across endorser countries will require to compare WES / WGS of different origin.
Files
EXCELERATE Deliverable D8.6.pdf
Files
(419.8 kB)
Name | Size | Download all |
---|---|---|
md5:3c282b205c7b229c69e617f9797eee90
|
419.8 kB | Preview Download |