From Field Observations and Plant Specimens to a Trans-continental Knowledge Base: Efficient, semantically rich integration of highly heterogeneous plant phenological data
- 1. Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of America
- 2. University of California at Berkeley, Berkeley, United States of America
- 3. Bio5 Institute, University of Arizona, Tucson, AZ, United States of America|CyVerse, Tucson, AZ, United States of America|CyVerse, Tucson, United States of America
Description
Ideally, an information system that automates the integration of disparate datasets should be able to minimize the loss of information from any one dataset, achieve computational complexity suitable for working with large datasets, be flexible enough to easily incorporate new data sources, and produce output that is easily analyzed and understood by data users. Achieving all of these goals within highly heterogeneous and highly complex data domains is a major challenge. In this talk, we present the results of our recent efforts to develop such a system for data about plant phenology. Our data integration system, which is built around the Plant Phenology Ontology, currently supports semantically fine-grained integration of phenological data from both field observations and herbarium specimens. We show that even with a heavily axiomatized ontology and sophisticated, machine-reasoning-based data analysis, it is possible to implement a high-throughput data integration pipeline capable of processing millions of individual records in a matter of minutes while running on modest, server-class hardware. Success requires careful ontology design and judicious application of machine reasoning techniques. We also discuss some of the many challenges that remain for designing efficient, general-purpose data integration systems.
Files
BISS_article_37614.pdf
Files
(69.4 kB)
Name | Size | Download all |
---|---|---|
md5:d7088533ff264810ae217b232ab80ca7
|
60.4 kB | Preview Download |
md5:c27c6c81ccfb0e7b2935fb9767a25a6d
|
9.0 kB | Preview Download |