Enabling machine-actionable semantics for comparative analyses of trait evolution
Collaborative grant proposal submitted to the US National Science Foundation (NSF), Advances in Biological Informatics (ABI) program, Innovation track. The proposal was awarded on August 30, 2017 as awards 1661529 , 1661456 , 1661356 , and 1661516 .
This document has been edited compared to the original version submitted to NSF. Specifically, figure captions that had originally been omitted were added back in to aid reader comprehension, and the text spacing was correspondingly adjusted.
The following text is the public (technical) summary.
The treasure trove of morphological data published in the literature holds one of the keys to understanding the biodiversity of phenotypes, but exploiting the data in full through modern computational data science analytics remains severely hampered by the steep barriers to connecting the data with the accumulated body of morphological knowledge in a form that machines can readily act on. This project aims to address this barrier by creating a centralized computational infrastructure that affords comparative analysis tools the ability to compute with morphological knowledge through scalable online application programming interfaces (APIs), enabling developers of comparative analysis tools, and therefore their users, to tap into machine reasoning-powered capabilities and data with machine-actionable semantics. By shifting all the heavy-lifting to this infrastructure, tools can programmatically obtain answers to knowledge-based questions that would otherwise require careful study by a human export, such as objectively and reproducibly assessing the relatedness, independence, and distinctness of characters and character states, with only a few lines of code. To accomplish this, the project will adapt key products and know-how developed by the Phenoscape project, including an integrative knowledgebase of ontology-linked phenotype data, metrics for quantifying the semantic similarity of phenotype descriptions, and algorithms for synthesizing morphological data from published trait descriptions. To drive development of the computational infrastructure and to demonstrate its enabling value, the project's objectives focus on addressing three concrete long-standing needs for which the difficulty of computing with domain knowledge is the major impediment: (1) computationally synthesizing, calibrating, and assessing morphological trait matrices from across studies; (2) objectively and reproducibly incorporating morphological domain knowledge provided by ontologies into evolutionary models of trait evolution; and (3) generating testable hypotheses for adaptive diversification by incorporating semantic phenotypes into ancestral state reconstruction and identifying domain ontology concepts linked to evolutionary changes in a branch or clade more frequently than expected by chance. In addition, to better prepare evolutionary biologist users and developers of comparative analysis tools for adopting these new capabilities, a domain-tailored short-course on requisite knowledge representation and computational inference technologies will be developed and taught. More information on this project can be found at http://cate.phenoscape.org/.