Published May 16, 2013 | Version v1
Dataset Open

Data from: Utilizing descriptive statements from the Biodiversity Heritage Library to expand the Hymenoptera Anatomy Ontology

  • 1. American Museum of Natural History
  • 2. University of Szeged
  • 3. Species File, Prairie Research Institute, Champaign, Illinois, United States of America*
  • 4. North Carolina State University
  • 5. Pennsylvania State University

Description

Hymenoptera, the insect order that includes sawflies, bees, wasps, and ants, exhibits an incredible diversity of phenotypes, with over 145,000 species described in a corpus of textual knowledge since Carolus Linnaeus. In the absence of specialized training, often spanning decades, however, these articles can be challenging to decipher. Much of the vocabulary is domain-specific (e.g., Hymenoptera biology), historically without a comprehensive glossary, and contains much homonymous and synonymous terminology. The Hymenoptera Anatomy Ontology was developed to surmount this challenge and to aid future communication related to hymenopteran anatomy, as well as provide support for domain experts so they may actively benefit from the anatomy ontology development. As part of HAO development, an active learning, dictionary-based, natural language recognition tool was implemented to facilitate Hymenoptera anatomy term discovery in literature. We present this tool, referred to as the 'Proofer', as part of an iterative approach to growing phenotype-relevant ontologies, regardless of domain. The process of ontology development results in a critical mass of terms that is applied as a filter to the source collection of articles in order to reveal term occurrence and biases in natural language species descriptions. Our results indicate that taxonomists use domain-specific terminology that follows taxonomic specialization, particularly at superfamily and family level groupings and that the developed Proofer tool is effective for term discovery, facilitating ontology construction.

Notes

Files

g2.txt

Files (53.1 MB)

Name Size Download all
md5:a3412ca1da606d76a5e1bbd6f1189bf7
5.2 kB Download
md5:40d9bd2b84dd8fe04cb2a9c207a0b24e
146.1 kB Preview Download
md5:e772b151292c97e62b9ee983a0ce241e
1.3 kB Preview Download
md5:80f25a1a5a703dfdb479ea5e3f043aa0
399.6 kB Preview Download
md5:776400a6ab4304f9324ba5eeebbffc7b
52.2 MB Preview Download
md5:4dbf934b4c92e5acd017fc12e605b1f5
3.0 kB Preview Download
md5:49349018213e5859be47fcbf8d494366
338.1 kB Preview Download

Additional details

Related works