Published February 28, 2024 | Version v1
Report Open

Human Genomes Platform Project: Data and Metadata Archiving Feasibility Report

Description

The generation of human genomics data is growing at an unprecedented rate. The retention and sharing of this data is important for a number of reasons, including:

  • Ensuring transparency and reproducibility of results.

  • Validation of new discoveries in comparable datasets.

  • Ability to reanalyse data with new methods and gain new insights.

  • Re-use of data for purposes not imagined by original data collectors.

  • Integration of datasets to gain more statistical power.

The availability of human genomics data for integration and reanalysis has the potential to contribute to improved health outcomes through biomarker discovery, clinical trials, and precision medicine. In addition, generating sequencing data for research is expensive and largely funded through government grants. Therefore, deriving the maximum value from these data and ensuring it benefits the wider community is paramount. 

As a result, within the human genomics community there is momentum towards making genomics data Findable, Accessible, Interoperable and Reusable (FAIR). Increasingly, many leading scientific journals require human genomics data to be submitted to a recognised archive before publication. In addition, the National Institutes of Health (NIH), which funds the vast majority of all health-related research in the US, mandates that all human genomic data they fund is shared. This has led to large amounts of human genomic data being submitted to global archives in order to meet these requirements. The value and utility of these data depend not only on the data being findable but also on the quality of the contextual metadata describing it. Typically, archives place minimal requirements on contextual metadata that may describe the clinical, demographic or phenotypic aspects of the samples, as well as information about the experiment/data acquisition methods used. Limiting these requirements can make the submission process easier but restricts how the data can be found and/or reused by others. 

In Australia, there are no national mandates or standards for how human genomics research data should be managed, shared, and preserved long-term. Peak funding bodies, the National Health and Medical Research Council (NHMRC) and the Australian Research Council (ARC), give general guidance and recommendations, but these are not tailored to the human genomics domain. Guidance is aimed mainly at institutes to implement clear policies around ownership, management, preservation and accessibility of data. 

The Australian government has recognised the need for a national approach to managing human genomics data (NAGIM). A blueprint for how this infrastructure could be established as well as a technology piloting phase have been undertaken. Australian human genomics research data is often siloed within institutes, or housed in difficult to access overseas archives, limiting the benefits that could be derived from these valuable data to Australians.

The Human Genomes Platform Project (hereafter referred to as ‘the HGPP’ or ‘the project’) is a collaborative research project aiming to overcome some of the challenges and limitations outlined above. Its goal is to enhance secure and responsible human genomic data sharing for research purposes. The project partners represent many of the largest human genome sequencing and analysis organisations in Australia. 

The goals of the data and metadata archiving sub-project within the HGPP are to: 

  • Understand the needs of stakeholders when submitting, downloading, and managing datasets in international repositories and the challenges they have faced.

  • Investigate the options and requirements for establishing national human genomics repositories in Australia.

  • Provide technical insight into implementation options.

  • Link to international communities, platforms, standards, and solutions.

Files

Data and Metadata Archiving_ Feasibility Report.pdf

Files (2.4 MB)

Additional details

Dates

Available
2024-02-28
Project Report