A Convolutional Neural Network based high-throughput image classification pipeline - code and documentation to process plankton underwater imagery using local HPC infrastructure and NSF's XSEDE
Creators
- 1. Hatfield Marine Science Center, Oregon State University, Newport, OR, USA
- 2. Center for Genomic Research and Biocomputing, Oregon State University, Corvallis, OR, USA
Description
Abstract
Scientific imaging (e.g., satellites looking at ocean color, medical imaging) can produce vast quantities of data that need to be processed on time frames similar to data collection. While satellite imaging has many advantages, the satellite’s sensors cannot penetrate the ocean’s surface more than a few meters. To that effect, underwater imaging systems have been developed in the last 40+ years that can image organisms in-situ in hundreds of meters of water. Underwater imaging systems include those designed for benthic studies (e.g., corals) as well as instruments that document the pelagic realm (e.g., plankton and fish). As an example, we use the In-situ Ichthyoplankton Imaging System (ISIIS) which collects upwards of 14 million images per hour of deployment; in highly productive waters this number can increase up to ten-fold. A typical cruise consisting of 70 hours of ISIIS deployment can yield upwards of 1 billion images of plankton and particles. This big data problem can only be solved by using a high throughput processing pipeline that can be scaled down or up depending on the available resources. Thus, we designed a modular Python-based pipeline that can be deployed on local high-performance computing (HPC) infrastructure such as a University’s HPC, as well as on cloud providers. The code provided with this documentation was optimized for Oregon State University’s Center for Genomic Research and Biocomputing (CGRB) as well as for the National Science Foundation’s Extreme Science and Engineering Discovery Environment (XSEDE), but can easily be adapted to the user’s needs. This code and documentation enable 1) the training of a sparse Convolutional Neural Network (sCNN), and 2) applying the sCNN in a processing pipeline to classify all remaining images in an automated fashion. Standard size measurements of the plankton and particles on the segmented images are also taken as part of the pipeline. The pipeline is optimized for speed and can classify upwards of 30 million images per hour on XSEDE Comet GPU compute nodes. End-to-end processing of 1 hour worth of raw imagery data (ca. 14 million images) using XSEDE CPU and GPU nodes takes ca. 2.4 hours, including data upload, segmentation, classification, and obtaining standard length measurements. This enables us to process a typical cruise of ten 7h transects in about a week. A training library of images as well as a video test dataset are supplied with the code. While the pipeline was built for ISIIS images, imagery from other underwater systems and other areas of science can be used with the pipeline.
Cite as
Schmid MS, Daprano D, Jacobson KM, Sullivan CM, Briseño-Avena C, Luo JY, Cowen RK. 2021. A Convolutional Neural Network based high-throughput image classification pipeline - code and documentation to process plankton underwater imagery using local HPC infrastructure and NSF’s XSEDE. [Software]. Zenodo. http://dx.doi.org/10.5281/zenodo.4641158
Notes
Files
Schmid et al. 2021. A Convolutional Neural Network based high-throughput image classification pipeline__zenodo_v1.0.0.pdf
Files
(2.1 GB)
Name | Size | Download all |
---|---|---|
md5:dd93b4b26a0e893069f2bf15d8610b03
|
238.8 MB | Download |
md5:7f4f388289bcab6afe9b8f122a38e249
|
1.8 GB | Download |
md5:6f0e6be0fe7ddb9592a1ad2d35bb8702
|
72.5 MB | Download |
md5:c03033135eb620f544852447bb2f5351
|
714.5 kB | Download |
md5:e31bac6a22043339bba4e5bd927b158f
|
885.0 kB | Preview Download |
Additional details
Related works
- Cites
- Journal article: 10.4319/LOM.2008.6.126 (DOI)
- Journal article: 10.1002/lom3.10285 (DOI)
- Journal article: 10.1038/s41598-020-57879-x (DOI)
References
- Cowen RK, Guigand C. 2008. In Situ Ichthyoplankton Imaging System (ISIIS): System design and preliminary results. Limnol Oceanogr Meth 6:126-32 https://doi.org/10.4319/LOM.2008.6.126
- Luo JY, Irisson J-O, Graham B, Guigand C, Sarafraz A, Mader C, Cowen RK. 2018. Automated plankton image analysis using convolutional neural networks. Limnol Oceanogr Methods 16: 814– 827 https://doi.org/10.1002/lom3.10285
- Schmid, MS, Cowen, RK, Robinson, K, Luo, JY, Briseño-Avena, C, Sponaugle, S. 2020. Prey and predator overlap at the edge of a mesoscale eddy: fine-scale, in-situ distributions to inform our understanding of oceanographic processes. Sci Rep 10:921 https://doi.org/10.1038/s41598-020-57879-x