Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published November 14, 2016 | Version v1
Poster Open

A Conceptual Architecture for Reproducible On-demand Data Integration for Complex Diseases

  • 1. Department of Biomedical Informatics, Center for Clinical and Translational Science, University of Utah, Salt Lake City, Utah, USA
  • 2. Department of Biomedical Informatics, Center for Clinical and Translational Science, College of Nursing, University of Utah, Salt Lake City, Utah, USA
  • 3. Department of Gastroenterology, University of Utah, Salt Lake City, Utah, USA
  • 4. Center for Clinical and Translational Science, University of Utah, Salt Lake City, Utah, USA
  • 5. Center for Clinical and Translational Science, Department of Endocrinology, University of Utah, Salt Lake City, Utah, USA

Description

Eosinophilic Esophagitis, which is a complex and emerging condition characterized by poorly defined phenotypes, and associated with both genetic and environmental conditions. Understanding such diseases requires researchers to seamlessly navigate across multiple scales (e.g., metabolome, proteome, genome, phenome, exposome) and models (sources using different stores, formats, and semantics), interrogate existing knowledge bases, and obtain results in formats of choice to answer different types of research questions. All of these would need to be performed to support reproducibility and sharability of methods used for selecting data sources, designing research queries, as well as query execution, understanding results and their quality.
We present a higher level of formalizations for building multi-source data platforms on-demand based on the principles of meta-process modeling and provide reproducible and sharable data query and interrogation workflows and artifacts. A framework based on these formalizations consists of a layered abstraction of processes to support administrative and research end users:

  • Top layer (meta-process): An extendable library of computable generic process concepts (PC) stored in a metadata repository1 (MDR) and describe steps/phases in the translational research life cycle.
  • Middle layer (process): Methods to generate on-demand queries by assembling instantiated PC into query processes and rules. Researchers design query processes using PC, and evaluate their feasibility and validity by leveraging metadata content in the MDR.
  • Bottom layer (execution): Interaction with a hyper-generalized federation platform (e.g. OpenFurther1) that performs complex interrogation and integration queries that require consideration of interdependencies and precedence across the selected sources.

This framework can be implemented using process exchange formats (e.g., DAX, BPMN); and scientific workflow systems (e.g., Pegasus2, Apache Taverna3). All content (PC, rules, and workflows), assembling, and executing mechanism are sharable. The content, design, and development of the framework is informed by user-centered design methodology and consists of researcher and integration-centric components to provide robust and reproducible workflows.

References
1. Gouripeddi R, Facelli JC, et al. FURTHeR: An Infrastructure for Clinical, Translational and Comparative Effectiveness Research. AMIA Annual Fall Symposium. 2013; Wash, DC.
2. Pegasus. The Pegasus Project. 2016; https://pegasus.isi.edu/.
3. Apache Software Foundation. Apache Taverna. 2016; https://taverna.incubator.apache.org/.

Notes

OpenFurther is support NCRR/NCATS UL1TR001067, UL1RR025764, 3UL1RR025764-02S2, AHRQ R01 HS019862, DHHS 1D1BRH20425, U54EB021973, UU Research Foundation, NIBIB, NIH U54EB021973

Files

OF3.0.pdf

Files (1.2 MB)

Name Size Download all
md5:d0464b455ac6bd61ea98af411cb4e0cd
1.2 MB Preview Download