Published July 2, 2019 | Version v1
Journal article Open

From Field Observations and Plant Specimens to a Trans-continental Knowledge Base: Efficient, semantically rich integration of highly heterogeneous plant phenological data

  • 1. Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of America
  • 2. University of California at Berkeley, Berkeley, United States of America
  • 3. Bio5 Institute, University of Arizona, Tucson, AZ, United States of America|CyVerse, Tucson, AZ, United States of America|CyVerse, Tucson, United States of America

Description

Ideally, an information system that automates the integration of disparate datasets should be able to minimize the loss of information from any one dataset, achieve computational complexity suitable for working with large datasets, be flexible enough to easily incorporate new data sources, and produce output that is easily analyzed and understood by data users. Achieving all of these goals within highly heterogeneous and highly complex data domains is a major challenge. In this talk, we present the results of our recent efforts to develop such a system for data about plant phenology. Our data integration system, which is built around the Plant Phenology Ontology, currently supports semantically fine-grained integration of phenological data from both field observations and herbarium specimens. We show that even with a heavily axiomatized ontology and sophisticated, machine-reasoning-based data analysis, it is possible to implement a high-throughput data integration pipeline capable of processing millions of individual records in a matter of minutes while running on modest, server-class hardware. Success requires careful ontology design and judicious application of machine reasoning techniques. We also discuss some of the many challenges that remain for designing efficient, general-purpose data integration systems.

Files

BISS_article_37614.pdf

Files (69.4 kB)

Name Size Download all
md5:d7088533ff264810ae217b232ab80ca7
60.4 kB Preview Download
md5:c27c6c81ccfb0e7b2935fb9767a25a6d
9.0 kB Preview Download

Linked records