Published November 29, 2016 | Version v1
Poster Open

A Framework for Metadata Management and Automated Discovery for Heterogeneous Data Integration

  • 1. Department of Biomedical Informatics, Center for Clinical and Translational Science, University of Utah, Salt Lake City, Utah, USA
  • 2. Center for Clinical and Translational Science, University of Utah, Salt Lake City, Utah, USA
  • 3. Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA

Description

Current approaches to metadata discovery are dependent on time consuming manual curations. To realize the full potential of Big Data technologies in biomedicine, enhance research reproducibility and increase efficiency in translational sciences it is critical to develop automatic and/or semiautomatic metadata discovery methods and the corresponding infrastructure to deploy and maintain these tools and their outputs.
Towards such a discovery infrastructure:
We conceptually designed a process workflow for Metadata Discovery and Mapping Service, for automated metadata discovery. Based on steps taken by human experts in discovering and mapping metadata from various biomedical data, we designed a framework for automation. It consists of a 3-step process: (1) identification of data file source and format, (2) followed by detailed metadata characterization based on (1), and (3) characterization of the file in relation to other files to support harmonization of content as needed for data integration. The framework discovers and leverages administrative, structural, descriptive and semantic metadata, and consists of metadata and semantic mappers, along with uncertainty characterization and provision of expert review. As next steps we will develop and evaluate this framework using workflow platforms (e.g. Swift, Pegasus).
In order to store discovered metadata about digital objects, we enhanced OpenFurther’s Metadata Repository (MDR). We configured the bioCADDIE metadata specifications (DatA Tag Suite (DATS) model) as assets in the MDR for harmonizing metadata of individual datasets (e.g. different protein files) for data integration. This method of metadata management provides a flexible data resource metadata storage system that supports versioning metadata (e.g. DATS 1.0 to 2.1) and data files mapped to different versions, enhance descriptors of resources (DATS) with descriptions of content within resources, and translations to other metadata specifications (e.g. schema.org). Also, this MDR stored metadata is available for various data services including data integration.

Files

MetadataMang_Frame_BD2K.pdf

Files (1.8 MB)

Name Size Download all
md5:19f93ac4398593c036eb6ae4b802c032
1.8 MB Preview Download