Published August 22, 2018 | Version 1.0.1
Dataset Open

Gold standard corpus, ontologies, and Entity-Quality ontology annotations for evolutionary phenotypes

  • 1. University of South Dak
  • 2. University of South Dakota
  • 3. University of Chicago
  • 4. University of Arizona
  • 5. Renaissance Computing Institute
  • 6. University of North Carolina at Chapel Hill
  • 7. Duke University

Description

This data set includes a gold-standard corpus of evolutionary phenotype descriptions (in the form of character state descriptions pulled from a variety of phylogenetic systematics studies), and their corresponding expert-curated annotations with ontology terms in the form of Entity-Quality (EQ) statements. EQ annotatons allow machine-reasoning (through the semantics encoded in the requisite ontologies from which the ontology terms are drawn), and machine-reasoning in turn enables computing metrics for quantifying the semantic similarity between different phenotype descriptions as represented by their EQ annotations.

Also included are the ontologies, and the human expert-generated and Semantic Charaparser (i.e., machine) generated EQ annotations used to assess Semantic Charaparser performance relative to inter-curator variation and to the effect of having access to external knowledge. The ontologies include those used as input, the "augmented" ontologies created by human curators in each experiment round, and the merged ontology used to maximize Semantic Charaparser's performance.

The production of the gold standard corpus, annotation experiments, and evaluation of the results are described in detail in the following manuscript:

Dahdul et al (2018) Annotation of phenotypes using ontologies: a Gold Standard for the training and evaluation of natural language processing systems. BioRxiv https://doi.org/10.1101/322156. Submitted to Database.

The analysis code for evaluating the gold standard corpus (and the input data and ontologies for that) are available separately from the following:

Manda et al (2018) Code and data for analysis of evolutionary phenotype ontology annotations and gold standard corpus. Zenodo. https://doi.org/10.5281/zenodo.1218010

In comparison to the previous version (v1.0.0), this record includes a file of MD5 checksums of the Gold Standard data files. The data files themselves are unchanged.

Files

Author-surveys and instructions.pdf

Files (47.0 MB)

Name Size Download all
md5:3201a9006c7a5fe3c9ed5c6a69aa4d11
11.0 MB Preview Download
md5:fe3981abea67d37fe8aa377a8e0779dd
228.4 kB Preview Download
md5:2ffe31340baf6e6257d7e950c3a3573d
137.2 kB Preview Download
md5:21565f642337374f28a001e52bff0397
612.9 kB Preview Download
md5:b878e451fe57696884b4327df1b077db
148.9 kB Download
md5:0b3c3ec607503fcff9e9e02748001b1a
183.3 kB Download
md5:a57c33fc88bdaaad82c15334ea5534d0
437 Bytes Preview Download
md5:cd597312ad534821390b8844b2aa4632
33.7 MB Preview Download
md5:b9cfbad067a82c83e1bf775bc991e13c
11.9 kB Preview Download
md5:dcdcf4af6591142cb855d7de3caad0f4
1.0 MB Preview Download