Published August 22, 2018 | Version 1.0.1
Dataset Open

Gold standard corpus, ontologies, and Entity-Quality ontology annotations for evolutionary phenotypes

  • 1. University of South Dak
  • 2. University of South Dakota
  • 3. University of Chicago
  • 4. University of Arizona
  • 5. Renaissance Computing Institute
  • 6. University of North Carolina at Chapel Hill
  • 7. Duke University


This data set includes a gold-standard corpus of evolutionary phenotype descriptions (in the form of character state descriptions pulled from a variety of phylogenetic systematics studies), and their corresponding expert-curated annotations with ontology terms in the form of Entity-Quality (EQ) statements. EQ annotatons allow machine-reasoning (through the semantics encoded in the requisite ontologies from which the ontology terms are drawn), and machine-reasoning in turn enables computing metrics for quantifying the semantic similarity between different phenotype descriptions as represented by their EQ annotations.

Also included are the ontologies, and the human expert-generated and Semantic Charaparser (i.e., machine) generated EQ annotations used to assess Semantic Charaparser performance relative to inter-curator variation and to the effect of having access to external knowledge. The ontologies include those used as input, the "augmented" ontologies created by human curators in each experiment round, and the merged ontology used to maximize Semantic Charaparser's performance.

The production of the gold standard corpus, annotation experiments, and evaluation of the results are described in detail in the following manuscript:

Dahdul et al (2018) Annotation of phenotypes using ontologies: a Gold Standard for the training and evaluation of natural language processing systems. BioRxiv Submitted to Database.

The analysis code for evaluating the gold standard corpus (and the input data and ontologies for that) are available separately from the following:

Manda et al (2018) Code and data for analysis of evolutionary phenotype ontology annotations and gold standard corpus. Zenodo.

In comparison to the previous version (v1.0.0), this record includes a file of MD5 checksums of the Gold Standard data files. The data files themselves are unchanged.


Author-surveys and instructions.pdf

Files (47.0 MB)

Name Size Download all
11.0 MB Preview Download
228.4 kB Preview Download
137.2 kB Preview Download
612.9 kB Preview Download
148.9 kB Download
183.3 kB Download
437 Bytes Preview Download
33.7 MB Preview Download
11.9 kB Preview Download
1.0 MB Preview Download