LifeWatch observatory data: phytoplankton annotated trainingset by FlowCam imaging in the Belgian Part of the North Sea

Decrop, Wout; Lagaisse, Rune; Mortelmans, Jonas; Muyle, Julie; Amadei Martínez, Luz; Deneudt, Klaas

doi:10.5281/zenodo.10554845

Published January 23, 2024 | Version v1

Dataset Open

LifeWatch observatory data: phytoplankton annotated trainingset by FlowCam imaging in the Belgian Part of the North Sea

1. Flanders Marine Institute

Contributors

Hosting institution:

VLIZ

Training dataset

The images were collected in the framework of the Belgian Lifewatch Research Infrastructure. During multidisciplinary campaigns, a number of fixed stations in the Belgian Part of the North Sea (BPNS) are visited on a monthly (onshore stations) or seasonal (offshore stations) basis. Samples are taken using a 55µm mesh size Apstein net and fixed in Lugol's iodine solution. In the lab, the samples are processed using a VS-4 FlowCAM model at 4X magnification targeting a particle size range of 55-300µm. The identification of the image data is done with the use of a CNN and followed by a manual validation step. Since May 2017, this dataset has provided micro- and phytoplankton observations, mainly covering diatoms, dinoflagellates and cilliates, for the Belgian Part of the North Sea (BPNS).

This dataset comprises a trainings datasplit of 337,613 images distributed across 95 classes, with each class containing a minimum of 100 and a maximum of 10,000 images. The goal of this dataset is to be able to facilitate model training, here we have organized the data into a standard split, with 80% allocated for training, 10% for validation, and another 10% for testing purposes. This dataset structure ensures a balanced representation and supports scientific rigor in subsequent analyses.

Technical details

Data preprocessing

Raw FlowCam output data is fully processed using in-house datapipelines, the VisualSpreadsheet software is only used for data acquisition during the lab run of the sample. Raw images and binary images are never saved during the FlowCam run, we only work on the image collages saved at the end of the run. Single images are cut from these collages using each image coordinates width and height pulled from the .lst file using in-house python code. The background of the images is not removed. These images are then predicted and annotated in-house at VLIZ.

Data splitting

The training dataset is 80% used for training, 10% for validation and 10% for prediction.

Classes, labels and annotations

The dataset comprises 337,613 images distributed across 95 classes, with each class containing a minimum of 100 and a maximum of 10,000 images. Taxonomic coverage of the dataset comprises mainly of diatoms, dinoflagellates and cilliates, but to a lesser extent also zooplankton and other protists.

Parameters

The images are read using cv2.imread and the values are used as parameters.

Data sources

Images are collected during the monthly monitoring of phytoplankton communities in the Belgian Part of the North Sea during the LifeWatch multidisciplinary campaigns by FlowCam VS-4 benchmodel (Fluid Imaging Technologies, Yarmouth, Maine, U.S.A.).

Data quality

All images are predicted and subsequently manually validated to ensure the quality of the trainingset.

Image resolution

The size range imaged is 55-300µm. Images are acquired using a Sony XCD SC90 digital gray-scale camera. Images are during training of CNN resized to 100px by 100px.

Spatial coverage

The data comes from a number of fixed stations in the Belgian Part of the North Sea (BPNS).

Nine stations onshore are visited monthly:

Station	Longitude	Latitude
130	2.90535	51.27055
780	3.057283	51.471367
330	2.809083	51.434117
230	2.85035	51.308683
710	3.138283	51.441217
215	2.61075	51.274867
ZG02	2.500717	51.33515
120	2.702483	51.186083
700	3.221017	51.377

Eight additional offshore stations are visited seasonally:

Station	Longitude	Latitude
LW01	2.256	51.568667
LW02	2.556	51.8
435	2.790333	51.580667
W07bis	3.012517	51.588033
W08	2.35	51.458333
W09	2.7	51.75
W10	2.416667	51.683333
421	2.45	51.4805

Temporal coverage

The monitoring was initiated in May 2017 and has been running continuously every month.

Contact information

For technical questions about training, you can contact wout.decrop@vliz.be.

For more information on the training dataset and FlowCam, you can contact rune.lagaisse@vliz.be.

Notes

The phytoplankton annotated dataset is a product of the "Flowcam plankton identification Use Case" within the "iMagine project" with founding from the European Union's Horizon Europe research and innovation programme. The authors express their gratitude to the project managers and all partners involved for fostering the creation of open-access image repositories for AI-based image analysis services. Special thanks are extended to the researchers that contributed to the phytoplankton dataset, which forms the foundation for phytoplankton annotated labels.

Files

Files (359.4 MB)

Name	Size	Download all
phytoplankton_images_and_datasplit.7z md5:60f5bdc408c744635279a80da9dc415f	359.4 MB	Download

Additional details

DOI: 10.14284/650

Is described by: Publication: 10.3897/bdj.10.e81208 (DOI); Publication: 10.5670/oceanog.2021.supplement.02-09 (DOI); Publication: 10.3897/BDJ.8.e57236 (DOI)

Ministerie van de Vlaamse Gemeenschap
Fonds Wetenschappelijk Onderzoek – Vlaanderen 1
European Union
This work was supported by the iMagine project with funding from the European Union’s Horizon Europe research and innovation programme under grant agreement 101058625
European Commission
iMagine - Imaging data and services for aquatic science 101058625

Available: 2024-01-25

	All versions	This version
Views	1,094	742
Downloads	188	134
Data volume	93.3 GB	56.1 GB

Contributors

Hosting institution:

Training dataset

Technical details

Data preprocessing

Data splitting

Classes, labels and annotations

Parameters

Data sources

Data quality

Image resolution

Spatial coverage

Temporal coverage

Contact information

Files (359.4 MB)

Identifiers

Related works

Funding

Dates

LifeWatch observatory data: phytoplankton annotated trainingset by FlowCam imaging in the Belgian Part of the North Sea

Authors/Creators

Contributors

Hosting institution:

Description

Training dataset

Technical details

Data preprocessing

Data splitting

Classes, labels and annotations

Parameters

Data sources

Data quality

Image resolution

Spatial coverage

Temporal coverage

Contact information

Notes

Files

Files (359.4 MB)

Additional details

Identifiers

Related works

Funding

Dates