Published May 6, 2021 | Version 1.0.0
Software Open

A Convolutional Neural Network based high-throughput image classification pipeline - code and documentation to process plankton underwater imagery using local HPC infrastructure and NSF's XSEDE

  • 1. Hatfield Marine Science Center, Oregon State University, Newport, OR, USA
  • 2. Center for Genomic Research and Biocomputing, Oregon State University, Corvallis, OR, USA

Description

Abstract

Scientific imaging (e.g., satellites looking at ocean color, medical imaging) can produce vast quantities of data that need to be processed on time frames similar to data collection. While satellite imaging has many advantages, the satellite’s sensors cannot penetrate the ocean’s surface more than a few meters. To that effect, underwater imaging systems have been developed in the last 40+ years that can image organisms in-situ in hundreds of meters of water. Underwater imaging systems include those designed for benthic studies (e.g., corals) as well as instruments that document the pelagic realm (e.g., plankton and fish). As an example, we use the In-situ Ichthyoplankton Imaging System (ISIIS) which collects upwards of 14 million images per hour of deployment; in highly productive waters this number can increase up to ten-fold. A typical cruise consisting of 70 hours of ISIIS deployment can yield upwards of 1 billion images of plankton and particles. This big data problem can only be solved by using a high throughput processing pipeline that can be scaled down or up depending on the available resources. Thus, we designed a modular Python-based pipeline that can be deployed on local high-performance computing (HPC) infrastructure such as a University’s HPC, as well as on cloud providers. The code provided with this documentation was optimized for Oregon State University’s Center for Genomic Research and Biocomputing (CGRB) as well as for the National Science Foundation’s Extreme Science and Engineering Discovery Environment (XSEDE), but can easily be adapted to the user’s needs. This code and documentation enable 1) the training of a sparse Convolutional Neural Network (sCNN), and 2) applying the sCNN in a processing pipeline to classify all remaining images in an automated fashion. Standard size measurements of the plankton and particles on the segmented images are also taken as part of the pipeline. The pipeline is optimized for speed and can classify upwards of 30 million images per hour on XSEDE Comet GPU compute nodes. End-to-end processing of 1 hour worth of raw imagery data (ca. 14 million images) using XSEDE CPU and GPU nodes takes ca. 2.4 hours, including data upload, segmentation, classification, and obtaining standard length measurements. This enables us to process a typical cruise of ten 7h transects in about a week. A training library of images as well as a video test dataset are supplied with the code. While the pipeline was built for ISIIS images, imagery from other underwater systems and other areas of science can be used with the pipeline. 

 

Cite as

Schmid MS, Daprano D, Jacobson KM, Sullivan CM, Briseño-Avena C, Luo JY, Cowen RK. 2021. A Convolutional Neural Network based high-throughput image classification pipeline - code and documentation to process plankton underwater imagery using local HPC infrastructure and NSF’s XSEDE. [Software]. Zenodo. http://dx.doi.org/10.5281/zenodo.4641158

 

Notes

This project was funded by the National Science Foundation under grant numbers OCE-1737399 and OCE-1419987, the National Aeronautics and Space Administration under grant number 80NSSC20M0008, the Belmont Forum (through NSF grant number 1927710), as well as the Extreme Science and Engineering Discovery Environment (XSEDE) under grant number OCE170012.

Files

Schmid et al. 2021. A Convolutional Neural Network based high-throughput image classification pipeline__zenodo_v1.0.0.pdf

Additional details

Related works

Cites
Journal article: 10.4319/LOM.2008.6.126 (DOI)
Journal article: 10.1002/lom3.10285 (DOI)
Journal article: 10.1038/s41598-020-57879-x (DOI)

References

  • Cowen RK, Guigand C. 2008. In Situ Ichthyoplankton Imaging System (ISIIS): System design and preliminary results. Limnol Oceanogr Meth 6:126-32 https://doi.org/10.4319/LOM.2008.6.126
  • Luo JY, Irisson J-O, Graham B, Guigand C, Sarafraz A, Mader C, Cowen RK. 2018. Automated plankton image analysis using convolutional neural networks. Limnol Oceanogr Methods 16: 814– 827 https://doi.org/10.1002/lom3.10285
  • Schmid, MS, Cowen, RK, Robinson, K, Luo, JY, Briseño-Avena, C, Sponaugle, S. 2020. Prey and predator overlap at the edge of a mesoscale eddy: fine-scale, in-situ distributions to inform our understanding of oceanographic processes. Sci Rep 10:921 https://doi.org/10.1038/s41598-020-57879-x