HAB detection within Aquaculture Industry: A Case Study in the Atlantic Area*

Fisheries and aquaculture industries notably contribute to animal-source protein production worldwide. Climate change is creating environmental conditions suitable for harmful algal blooms (HAB) on a global scale. Some phytoplankton species can also release toxins, which may cause large-scale marine mortality with knock-on effects on coastal economies. Reliable phytoplankton monitoring and early HAB detection are also essential in climate-resilient solutions for aquaculture applications. Currently, phytoplankton monitoring is primarily based on traditional microscopy. However, it is time-consuming and requires an experienced taxonomist. There is a need to expedite and automate phytoplankton monitoring to support aquaculture industries. Analytical instruments based on microscopy coupled with artificial intelligence (AI) models may be vital to monitoring applications. Digital plankton data sets are usually imbalanced and reflect natural environmental differences. The lack of data to represent minority species/genera prevents AI models from understanding some taxa completely. It compromises system reliability for HAB monitoring applications. The present study investigates state-of-the-art models for class imbalance problems tailored for HAB monitoring within multi-trophic aquaculture farms from Brazil, South Africa, and Scotland. A unified benchmark database covering publicly available microscopic image-based datasets supported phytoplankton modelling. AI deep collaborative models and threshold moving techniques provided the best results compared to standard architectures. It prevailed, especially for low-abundant yet toxic organisms.


I. INTRODUCTION
Aquaculture is a significant and expanding industry that provides a sustainable source of seafood for people worldwide. In 2022, it produced 76.9 tons of animal-based protein for human consumption, making up 49% of total seafood production [1]. It highlights the increasing importance of aquaculture in meeting the growing demand for fish as a source of food. However, aquaculture industry faces a number of challenges. Harmful Algal Blooms (HAB) have caused expressive economic losses and compromised aquaculture production worldwide (e.g. loss of thousands of salmon within Scottish (£10 million loss) and Chilean (US$ 50 million) aquaculture industries). In this sense, This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement 863034.
providing AI solutions for phytoplankton monitoring and early HAB detection tailored for aquaculture industry has paramount importance.
HAB are characterized by the fast growth of phytoplankton biomass. It may exert many adverse effects such as the proliferation of toxin-producing species and large-scale marine mortality leading to economic impacts in coastal regions and serious consequences for aquaculture industries [2]. Climate change has increased the frequency and severity of HAB worldwide [3], making reliable HAB monitoring within aquaculture applications an imperative need. AI models coupled with microscopy image-based analytical instruments may best support early HAB detection within aquaculture applications.
Plankton databases are usually skewed and reflect natural distribution differences within the environment [4]. Plankton images from the same taxonomic species have inherent cell orientation, colour and size variability. There may not be enough data to represent minority HAB species properly. As a result, it prevents AI models from gaining a complete understanding of low-abundant yet potentially toxic taxa [4]. Recent technological advances in machine learning have enabled in situ plankton image capture in real-time at low cost [5]. State-of-the-art discussions present deep learning as a crucial solution for early HAB detection [6], [7]. However, they still struggle with phytoplankton image classification within aquatic monitoring scenarios. For instance, high intraclass variability and interclass similarity prevent the practical identification of morphologically similar species [7], and lowabundant organisms due to imbalanced databases [4].
Phytoplankton species associated with HAB events vary by region because of specific environmental conditions. Harmful algae can be found across many taxonomic groups. In particular, several dinoflagellates frequently develop HABs: Gymnodinoidsand (e.g. Gymnodinium), Gonyaulacoids (e.g., Alexandrium, Gonyaulax, Ostreopsis, Ceratium), Dinophysoids (Dinophysis), Prorocentroids (Prorocentrum) [8]. Publicly available microscopic image databases (e.g. WHOI [9]) have been the basis for building most AI solutions for phytoplankton monitoring. However, they may not provide appropriate data to support AI models tailored for end-user needs in terms of diversity of species and number of images. Phytoplankton diversity makes communities highly heterogeneous in size, shape, and morphology within different areas. Stateof-the-art models built upon representative image databases may best support aquaculture industry applications. There is an imperative to consider target phytoplankton species, end-user needs and expectations for building deployable and reliable deep-learning solutions.
This work investigates early HAB detection and phytoplankton monitoring within Integrated Multi-Trophic Aquaculture (IMTA) farms based on state-of-the-art deep learning techniques tailored for class imbalance classification problems. A data integration pipeline builds a representative database from publicly available datasets considering end-user needs and constraints. The work aims to provide automatic phytoplankton monitoring with high throughput for rapid HAB detection. Open challenges for reliable phytoplankton monitoring and technology deployment within aquaculture applications are also discussed.

II. RELATED LITERATURE
Biased plankton databases reflect natural imbalances within the aquatic environment [4]. Many approaches can address class imbalance and phytoplankton classification challenges. However, Convolutional Neural Networks (CNNs) still face misclassification because of similarity in shape, size and texture among phytoplankton species. CNNs with repeated layers (e.g. ResNet, MobileNet), optimization functions, and ensemble techniques present promising research areas [10] to support HAB monitoring applications.
A heterogeneous ensemble of CNN models harnesses the limited understanding of individual models to provide a collective and more accurate classification of minority classes [4]. Two-phase learning allows the minority classes to contribute more to the gradient descendant during a pre-training stage [11]. New loss functions also address class imbalance problems. Other techniques include Cost Sensitive (CS) learning and Threshold Moving (TM) [12]. CS assigns higher weights to minority classes and minimizes misclassification cost [11]. Unlike other methods, TM may be quickly implemented on already trained models to improve classification results. It has outperformed baseline CNNs for different levels of class imbalance [12].
Several studies provide comparisons among classic CNN architectures. Densenets [13], Nasnets [13], Resnets [14], and VGGNets [14] are commonly described as prominent models to boost the classification of minority classes individually. The best performance varies within pre-processing strategies, datasets, training parameters, and other aspects. In this sense, testing different approaches for classifying phytoplankton is an essential primary task for reliable monitoring.

A. Dataset
The present work employs the data integration pipeline proposed by [15]. It considers publicly available image-based  datasets and targets phytoplankton genera ( Figure 1) within IMTA labs from Brazil, South Africa, and Scotland. Toxin producers ( Figure 1) may provide additional threats to aquatic ecosystems even at very low cell abundance. The resulting unified benchmark database aims to support ASTRAL technology development and validation in industrially relevant environments.
The data integration pipeline uses the most comprehensive public dataset (WHOI -Woods Hole Oceanographic Institution [9]) as the basis for data processing. Output data are greyscaled, fixed-size images, considering expected size ranges within target phytoplankton genera. Grey-scale information is replicated into three image channels to address AI input requirements. The pipeline succeeded towards a more representative database for the target IMTA applications. Unfortunately, the data integration yielded a severely imbalanced database (imbalance ratio of 1 : 2400). Only 73% of target phytoplankton genera have at least 20 images (Table I) and were used for AI modelling. Figure 2 illustrates image examples of target phytoplankton genera. Intraclass variability and interclass similarity may provide challenges for accurate classification. The database is randomly split into training and testing (80% and 20% of images, respectively). Validation employed twenty per cent of the training images.

B. Class imbalance approaches
Several deep learning architectures have shown promise in boosting the classification of minority classes in phytoplankton datasets, including DenseNets [14], NasNets [14], ResNets [13], [14], and VGGNets [13]. However, the best-performing model architecture can vary depending on factors such as pre-processing strategies, training parameters, and the specific dataset being used. Therefore, testing different approaches for phytoplankton classification is a crucial step in supporting early HAB monitoring in aquaculture industries. This study investigates several CNN architectures tailored for phytoplankton monitoring within industrial IMTA applications. The most effective CNN architecture will serve as a baseline for implementing state-of-the-art approaches and addressing the class imbalance classification problem. Table II summarizes hyper-parameters used to train and select the baseline model. Specifically, we investigate the following architectures: • VGG16 introduces the use of smaller receptive fields (3x3) compared to conventional convolutional networks. As a result, this network achieves high performance by having more activation layers and fewer weight parameters than 5x5 and 7x7 models. [16]. • InceptionV3 employs factorized convolutions and dimension reduction in a 48-layer deep learning model. This architecture improves computational cost and may be over three times faster than similar networks [17]. • NASNetMobile architecture is also optimized for mobile and embedded vision tasks. NASNet focuses on searching for an optimal CNN architecture using reinforcement learning. NAS (Neural Architecture Search) proposes to search for a good architecture on a small dataset (CIFAR-10) and then transfer the learned architecture to a more extensive dataset (ImageNet). • MobileNetV2 targets mobile and resource-constrained platforms. They encompass depth-wise separable convolutions. It is a form of factorized convolution that significantly reduces computational cost and model size.
MobileNetV2 also introduces the inverted residual with a linear bottleneck layer. 1) Focal loss: Focal Loss (FL) reshapes the cross entropy loss to reduce the impact caused by more easily classified samples during the training process [11]. FL comprises changing the Cross-Entropy (CE) loss to prevent large numbers of easily classified samples from the primary classes from overwhelming the training process [18]. In the Focal Loss equation (Eq. (1)), α t is a class-wise factor. It increases the relevance of minority classes. The hyperparameter γ defines the rate that down weights easy examples [18].
It is adapted for a multi-class problem (Eq. (2)) by summing the individual loss for each of the n classes [19]. y t and p t represent the expected and predicted probabilities for the class t, respectively.
2) Cost-Sensitive learning (CS): CS approach applies different penalties to the learner, depending on the class of a misclassified sample [11]. Each instance contributes to the loss proportionally to its class weight. Therefore, the cost of a class is directly proportional to its importance in updating weights. The present work empirically defines the cost of each class. Keras feeds it into the network through the class weight parameter in Keras' fit method. The cost for minority classes (n < 100) is set to ten (10×) times higher than for abundant classes.
3) Two-phase learning: Two-phase learning usually combines Random Under Sampling (RUS) with transfer learning. The pre-training phase adjusts the model based on a balanced dataset. [20] experimentally defined a balanced database with 5000 images per class to support the pre-training stage. The present study employs a hybrid approach with RUS and Random Over Sampling (ROS) to build a balanced dataset (N = 5000) [11]. A final training phase employs the original imbalanced data for model fine-tuning.
RUS randomly selects N images from classes with over 5000 images. Then, ROS employs data augmentation techniques for the remaining classes. It artificially generates additional training images, considering data augmentation pa-rameters adapted from [4]. The random augmentation aims to allow the model to understand low-abundant phytoplankton genera better. The pre-training phase runs until the metric cannot improve for over five epochs. The second training stage comprises model fine-tuning with the original class distribution.
4) Dynamic Sampling (DS): DS [21] changes the class distribution of the training samples dynamically. The model iteratively focuses on classes with poor performance within the training process. DS splits the database into training, reference, and testing sets. Initially, the number of samples for each class is N * , which is the average number of samples. By the end of each training iteration, F1-Score assesses performance for the reference set. It is the basis for defining the number of samples from each class during the next training iteration. Equation (3) defines the number of samples N of a class c k in the iteration i.
The present work uses 20% of the training dataset as a reference set, leaving 80% for the training itself. Considering this split, the average number of samples in the training set of all eleven classes is 4735. However, the number of samples from eight of the eleven classes is lower than this value. Therefore, two dynamic sampling approaches are implemented.
The first approach uses the ROS method to ensure all classes have at least 4735 training samples. Alternatively, the other approach uses no over-sampling. Instead, the number of images of each class is defined by min(N i,c k , n c k ) where N i,c k is calculated as in Eq. (3) and n c k is the number of samples of class c k in the training set. The F1-score assesses class performances for both approaches according to the reference set. The next training iteration employs images randomly sampled from the training database (Eq. 3). 5) Ensemble methods: Deep collaborative models have provided outstanding performance compared to individual CNNs [4], [13]. The work assembles different models tailored for class-imbalance applications to boost low-abundant phytoplankton genera classification. The deep learning collaborative model includes the two models with the highest performance compared to the baseline. 6) Threshold moving (TM): TM adjusts the decision threshold of a classifier during the inference phase by changing the output class probabilities. The most basic version compensates for prior class probabilities. Considering neural networks estimate Bayesian a posteriori probabilities, the output y for class i implicitly corresponds to yi(x) = p(i|x) = p(i) * p(x|i)/p(x) for a data point x [22]. Thus, dividing the network output for each class by its estimated prior probability provides the correct class probabilities (Eq. (4)).
where |i| denotes the number of unique examples in class i. The present work applies the threshold moving technique to the resulting model with the highest performance within the target phytoplankton genera.

IV. EVALUATING MODEL PERFORMANCE
Accuracy is the ratio between correctly classified samples against the total number of tested data. Although intuitive, overall accuracy is a misleading performance metric for imbalanced scenarios [12]. AI models may provide high accuracy levels and still achieve poor performance for low-abundant taxa. F-score (Eq. 5) is sensitive to the performance within minority classes [23]. It provides the harmonic mean between precision (Precision = T P T P +F P and recall (Recall = T P T P +F N ) metrics. T P , F P and F N are the numbers of true positives, false positives and false negatives in a classification process.
Since precision provides a more accurate representation of a model performance in a skewed distribution [24], the evaluation metrics also include the area under the precisionrecall curve (AUC-PR). The present work also assesses model size. It supports further model integration into embedded platforms and resource-constrained environments.

V. RESULTS AND DISCUSSIONS
The data integration pipeline provided a more representative benchmark database to support HAB monitoring within target aquaculture applications. However, the resulting severe class imbalance (ratio 1 : 2400) and the lack of data to properly represent key bloom-forming genera (Alexandrium, Anabaena, Nodularia and Lingulodinium) indicate available databases may not support many end-user needs. The investigation of state-of-the art models to deal with class imbalance issues, interclass similarity and intraclass variability still play an important role for the practical and reliable deployment of HAB monitoring models. Table III summarizes performance within investigated CNN architectures. MobileNetV2 provided the best individual results and was selected as the baseline model. It achieved the best performance and comprised a smaller model size which may be helpful for embedded and resource-constrained applications. Table IV depicts MobileNetV2 performance within target phytoplankton genera. However, it still struggled with phytoplankton genus classification, especially for lowabundant and yet toxin-producer genera (Alexandrium, Anabeana, Lingulodonium and Nodularia genera with n ≤ 100). Therefore, improving model performance for low-abundant classes is imperative for reliable phytoplankton monitoring within aquaculture applications.
FL, TM, CS, DS and deep collaborative methods approaches are investigated to address the class imbalance problem. Table  V depicts the resulting performances within target phytoplankton genera. Overall performance and model sizes are summarized in Table VI. FL provided minor performance gains compared to the baseline model. The dynamic sampling approach performed better than FL and CS learning. It also provided a smaller model size, which may better support model integration into embedded resource-constraint systems. Although [12] indicated two-phase learning may not provide performance gains compared to ROS and RUS data-level techniques, two-phase learning provided valuable performance gains compared to baseline, FL, and DS models. The deep collaborative approach combined DS and twophase learning models, resulting in exceptional performance compared to the baseline and other models, with a 14% improvement in F-score over the best individual model. The resulting model size was 90% larger than the baseline CNN, but it was smaller than the two-phase learning model. [4] highlights performance improvement is accomplished as long as each learner gains a unique insight about the classification task within a deep collaborative system. In this context, the ensemble of DL models specifically tailored for a target classimbalance application through distinct training techniques may provide unique insights into the classification problem and effectively support deep collaborative strategies.
The TM technique was applied to the deep collaborative model and allowed further performance improvement as suggested by [12] without affecting model size (Table  VI). The proposed method provided promising performance gains to address phytoplankton monitoring within aquaculture applications. It allowed for building deep collaborative models tailored to aquaculture needs and requirements. The main results include outstanding classification improvement of low-abundant and toxin-producing genera. For instance, the F-score metric for identifying Alexandrium, Anabeana, Lingulodonium and Nodularia genera increased from 0.00, 0.27, 0.67 to 0.86, 0.67, 0.80 and 0.84, respectively. Twophase learning has enabled performance gains for the lowabundant general classification. However, it increased model size by 90% compared to baseline MobileNetV2 which may affect AI usability within resource-constrained prototypes.
Deep collaborative modelling, DS and TM techniques have also allowed model optimization. TM may be employed upon the latest deep learning architectures without compromising model employability in embedded and resource-constrained applications. Potential trade-offs among model performance and size which may play an important role in reliable embedded solutions for early HAB detection within aquaculture applications. It prevails especially for low-cost resourceconstrained embedded systems.

VI. CONCLUSIONS
This study has presented the development of AI deep learning models specifically tailored for phytoplankton monitoring and early HAB detection within IMTA industries. It has provided a comprehensive discussion on the key challenges associated with supporting early HAB detection in industrially relevant scenarios. It is important to note that publicly available datasets may not fully meet the diverse needs of endusers, and addressing class-imbalance issues remains crucial for reliable phytoplankton monitoring.
The proposed methodology has demonstrated promising performance improvements in addressing phytoplankton monitoring requirements in aquaculture applications. It has built deep collaborative models customized to the specific needs and requirements of the aquaculture industry. Notably, the results have shown outstanding classification improvements for lowabundant and toxin-producing genera. This highlights the potential of deep collaborative models, DS and threshold moving as viable solutions for achieving reliable HAB monitoring. The ongoing work also encompasses the integration of these models into embedded platforms, aiming to assess their impact on prototype throughput and autonomy.

VII. ACKNOWLEDGEMENTS
This work was developed as part of the ASTRAL (All Atlantic Ocean Sustainable, Profitable and Resilient Aquaculture) project. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement 863034. The authors would like to acknowledge Kati Michalek from the Scottish Association for Marine Science (SAMS) for phytoplankton sample provision and draft revision. We also want to thank Elisa Ravagnan, Luis Poersch and Wilson Wasielesky for their support.