# README for data archive from: "Toward synthesizing our knowledge of morphology: Using ontologies and machine reasoning to extract presence/absence evolutionary phenotypes across studies" T.A. Dececchi, J.P. Balhoff, H. Lapp, P.M. Mabee. Data for these reports is generated by scripts or SPARQL queries found within the [Ontotrace source repository](https://github.com/phenoscape/ontotrace). Data used in the manuscript were generated using a copy of the Phenoscape Knowledgebase from 2014-6-26. Periodic data dumps from the Phenoscape Knowledgebase can be obtained from [DataHub](http://datahub.io/dataset/phenoscape-kb). The following ontologies relevant to this study are used within the Phenoscape Knowledgebase: * [Uberon anatomy ontology](http://purl.obolibrary.org/obo/uberon/ext.owl), version http://purl.obolibrary.org/obo/uberon/releases/2014-06-26/ext.owl * [Biospatial ontology](http://purl.obolibrary.org/obo/bspo.owl), version http://purl.obolibrary.org/obo/bspo/releases/2014-02-03/bspo.owl * [Phenotype and trait ontology](http://purl.obolibrary.org/obo/pato.owl), version http://purl.obolibrary.org/obo/pato/releases/2014-04-09/pato.owl * [OBO relations ontology](http://purl.obolibrary.org/obo/ro.owl), version http://purl.obolibrary.org/obo/ro/releases/2014-05-16/ro.owl * [Vertebrate taxonomy ontology](http://purl.obolibrary.org/obo/vto.owl), version http://purl.obolibrary.org/obo/vto/2014-5-14/vto.owl ## `sarcop-presence-absence-all.xml` NeXML character matrix, generated using these input expressions to Ontotrace: * anatomy: ` some ( or ) or some ( or )` * taxonomy: `` ## `sarcop-presence-absence-variable.xml` NeXML character matrix retaining only variable columns from `sarcop-presence-absence-all.xml` ## `sarcop-presence-absence-variable.nex` NEXUS-formatted translation of `sarcop-presence-absence-variable.xml` ## `uberon_presences.owl` OWL ontology containing a "presence class" corresponding to each anatomical structure from the Uberon anatomy ontology. Ontotrace code in `GeneratePresenceClasses.scala`. ## `Supplementary Materials.docx` Collected supplementary tables in more readable form. ## `supplementary_table_1.txt` List of publications used in constructing the synthetic supermatrix. Focal group, number of taxa, and number of fin, limb, and girdle characters, states and phenotype annotations. Studies focused explicitly on the fin to limb transition are denoted by an asterisk. Generated using the SPARQL query `count_relevant_data_expanded.rq` found within the Ontotrace source code. ## `supplementary_table_2.txt` List of Taxa, totaling 136 OTUs, present in the variable-only synthetic supermatrix based on inferred data alone. Ontotrace code in `Report.scala::taxaWithOnlyInferredStates`. ## `supplementary_table_3.txt` Conflicting characters. Characters with conflicting states in the variable-only supermatrix, listed by taxon. Conflict type (between direct assertions, direct vs. inferred, and inferred vs. inferred) indicated in right-most column. Ontotrace code in `ConflictReport.scala`. ## `supplementary_table_4.txt` Correlated characters. Clusters (93) of fully correlated characters across the variable-only synthetic supermatrix, arranged from largest (10) to smallest (2). A character name followed by [inferred] indicates that character has only inferred data in the matrix. Ontotrace code in `ClusterCharacters.scala`. ## `supplementary_table_5.txt` Taxon sampling. The number of source matrices/publications (right-most column) from which taxa (VTO identifier number, left-most column) at various taxonomic ranks, were sampled. Generated using the SPARQL query `count_pubs_per_relevant_taxon.rq` found within the Ontotrace source code. ## `supplementary_table_6.txt` The number of published character states that entail the presence or absence for selected sets of anatomical entities and taxa. Ontotrace code in `DataCoverageFigureReport.scala`.