Published November 18, 2019 | Version v1
Proposal Open

A Common Dialect for Infrastructure and Services in Translator

  • 1. Oregon State
  • 2. Oregon State University
  • 3. Renci
  • 4. LBNL

Contributors

  • 1. Oregon State University
  • 2. LBNL

Description

1) We will build a robust, dynamic Translator Standards and Reference Implementations Component (SRI) that integrates the collaborations and investments that the NCATS Translator has made to date. This component will consist of a suite of standards and products, a model for their governance, and processes to coordinate integration and shared implementation:

  • Community governance coordination will be developed with community buy-in to ensure an effective collaborative environment, and drive consortium-wide consensus on the other components.

  • Architecture and API specifications will drive community efforts to define details of project architecture and communication protocols across Translator Knowledge Providers (KPs), Autonomous Relay Agents (ARAs), and the Autonomous Relay System (ARS).

  • The BioLink model will define the standard entity types, relationship types, and a schema shared by all Translator components. This includes related utility libraries and a novel approach to accommodate multiple alternate data modeling perspectives. 

  • Integrated reference ontologies will provide BioLink-compliant terms and relationships. We will draw on the ROBOKOP Ubergraph framework [1], the Monarch integrated ontologies, and other ontologies from Open Biological and Biomedical Ontologies (OBO) [2].

  • A continually-updated knowledge graph and data lake will provide Translator with a standardized and integrated global view of the whole information landscape.

  • Next-generation Shared Translator Services will integrate features of ROBOKOP [3], Monarch [4], BioLink [5], and the reasoner APIs to remove integration barriers. These services will provide validation, lookup, and mapping functionality for use across Translator.

  • A registry of Translator KPs, ARAs, and shared services will increase efficiency, eliminate duplication of effort, and promote collaboration.

2) Our proposed SRI will address the problem of connecting together different components and data/information sources at scale, with community buy-in, and with a plan for sustainability.

3) For the development of the standards component of the SRI, our plan will begin with accepted Translator standards, and we will work with the ARS, ARAs, and KPs to identify gaps. We will have a community process for contributing to the standards, making use of GitHub pull requests and voting, to help everyone contribute effectively and fairly with clear attribution. We will ensure rigorous documentation and testing. For the reference implementation component, we will stand up core Translator services, and will include additional services if they are useful to more than one Translator component rather than used by only one. 

4) Consensus-building is hard. Our team has proven expertise and resources to identify needs, refine solutions, and find agreement, thereby successfully bringing infrastructure to fruition. Our team also has the technical and biological expertise to design and test the necessary standards, having been at the forefront of multiple ontology, data standards, and large enterprise software initiatives.

5) The Translator infrastructure is by nature heterogeneous, distributed, and growing; consequently, the most significant data and infrastructure challenge is managing the validity, currency, equivalency, and typing of entities (diseases, phenotypes, drugs, etc.). Our group has developed several innovative algorithms for managing this and related problems; these algorithms are in use for other integration projects and will be modified to suit Translator needs.

Files

FY20 SRI Translator Proposal Redacted for Zenodo.pdf

Files (327.4 kB)