RealForAll: real-time system for automatic detection of airborne pollen

ABSTRACT The aim of this paper is to describe a solution suitable for the automation of standard pollen information service (EN 16868:2019). We are describing the RealForAll integrated information system developed for automatic airborne pollen detection and real-time data delivery to end-users. This solution is based on the measurements from the Rapid-E airborne particle monitor. The system incorporates an AI-enabled subsystem based on a convolutional neural network that continuously retrieves raw data from Rapid-E and performs the classification of airborne pollen. The main advantages of this system reflect in real-time data delivery and independence of aerobiology experts during the pollen season.


Introduction
Pollen is one of the most common triggers of seasonal allergies and around 30% of the world population suffers from some form of allergic disease (Akdis, Hellings, and Agache 2015). In most European countries, national organisations of various kinds provide information about pollen concentration in the air, publish pollen forecasts and issue warnings. Method for sampling and analysis of airborne pollen is standardised by EN 16868:2019 (Ambient air -Sampling and analysis of airborne pollen grains and fungal spores for networks related to allergy -Volumetric Hirst method). That standard prescribes the use of Hirst-type volumetric devices (Hirst 1952) for sampling airborne particles. The device sucks in 10 l of air with suspended aerosols per minute, which then impacts on the adhesive coated, transparent plastic tape that moves past the inlet at 2 mm per hour to give a time-related sample. The collected samples are analysed manually using a light microscope (Galán et al. 2014;Buters et al. 2018a). This is a very tedious, labour intensive and time-consuming method and measurement data are always delayed from a few days up to a few weeks.
Real-time measurements of airborne pollen concentrations can improve the quality of life in a pollen sensitive population. Timely information can help people to prevent allergy symptoms and to better manage their allergic diseases (Bousquet et al. 2019). If patients had access to information about immediate exposure levels, they could take appropriate medication and plan their activities. Also, pollen forecasts play an integral role in the management of pollen allergies. Real-time measurements can be used for the improvement of short-term forecasts, particularly by enabling assimilation of measurement data and models (Sofiev et al. 2017). In addition, up-to-date information on the concentration of harmful airborne particles in the air (e.g. fungi and spores) can be very effective in agriculture and forestry (Garzia-Mozo 2011). For example, application of protective agents at the right moment can prevent crop damage and increase yields.
At present, a number of web portals and mobile applications provide outdated information about airborne pollen concentration or estimated current values resulting from coupling the previous day's observations with pollen forecasts (Pasyfo 2019;Poleninfo 2019;Norkko 2019). In order to improve the pollen information service and provide real-time measurements to end-users, it is necessary to automate the whole process of pollen detection. Automation of this process requires a measurement device (particle monitor) capable of producing data in digital format which can be further processed automatically. Such a device should sample and characterise single airborne particles with sufficient detail to enable their identification. Application of advanced technologies made real-time pollen monitoring possible only recently. Currently, two types of technologies seem to be the most suitable for detecting airborne pollen: automatic multi-stack image recording and laser-induced fluorescence (Alex et al. 2019). Another challenge in automatic pollen detection is to develop an information system which will harvest raw data from particle monitors, process it and disseminate classification results to end-users. Despite suitable particle monitors being commercially available, there is no out-of-the-box system that integrates detection and classification particles of interest. Such automatic integrated systems have to be developed considering the needs of stakeholders.
An interactive map (https://oteros.shinyapps.io/pollen_map/) visualises the distribution of pollen monitoring stations throughout the world (Buters et al. 2018a). Search for only automatic stations on the map results in four types of operational solutions for pollen detection. However, they have some limitations regarding time necessary to produce information or the number of pollen types which can be identified. Japan is the pioneer in automatic detection of airborne pollen using the KH-3000 particle monitor. Their solution is limited to monitoring of only concentrations of Japanese cedar (Hanakosan. 2019; Kawashima et al. 2017 The aim of this paper is to describe a solution to automate pollen information service, which overcomes limitations associated with previously mentioned solutions. This is achieved by developing the RealForAll integrated information system for automatic detection of various pollen types and real-time data delivery to end-users (RealForAll 2019). This is the first solution which integrates Rapid-E particle monitor, a state-of-the-art technology in the field of automatic pollen monitoring (Plair 2019), and artificial intelligence (AI) techniques in order to classify full spectra of allergy relevant airborne pollen types. Classified data are transferred to a subsystem responsible for storing and delivering airborne pollen concentration to end-users through web and mobile applications in real-time.
The Rapid-E particle monitor is based on laser-induced fluorescence technique and this is the first study which evaluates its performance in automatic real-time pollen monitoring comparable with the requirements given in EN 16868:2019. So far only proof of concept study was conducted (Šaulienė et al. 2019) which set the basis for this full season performance evaluation on larger spectra of pollen types in an operational environment.
Since automatisation is a prerequisite for fostering the mobile health concept in allergy care (Matricardi et al. 2019) the results of this work are expected to support an ongoing rapid change of pollen monitoring (Buters, Schmidt-Weber, and Oteros 2018).

Related work
Automation of a process related to airborne pollen detection may be positioned in the domain of environmental informatics. Environmental informatics applies computer science disciplines to environmental information processing (Hilty et al. 1995). Environmental data have a complex nature and its processing requires the application of advanced information technologies like machine learning, deep learning, data analysis and data mining. There are a lot of recent examples of AI techniques application in solving environmental problems (Wang et al. 2019;McGovern et al. 2017;Manogaran and Lopez 2018), which serves developing integrated information systems for the monitoring and management of environmental data (Fang et al. 2014(Fang et al. , 2015. There is a notable interest in a real-time detection of bioaerosols which is extensively overviewed by Alex et al. (2019). The authors gave an overview of major techniques and devices for real-time airborne particle detection. They emphasised laser-induced fluorescence as the most promising technology for automatic detection of bioaerosols of interest including different allergenic pollen types. This technique uses monochromatic light to trigger scattering and fluorescence which are then detected to analyse chemical composition, size and morphology of individual particles. Compared to simple particle counters, the laser-induced fluorescence approach is more suitable for real-time pollen monitoring where identification of diverse pollen species is required since they provide diversity of data needed for precise classification of bioaerosols. Rapid-E pollen monitor integrated into the RealForAll system records both scattering and fluorescence characteristics for each sampled airborne particle (Kiselev, Bonacina, and Wolf 2013). Result of that recording is a complex dataset that requires the usage of advanced machine learning tools for identification differences between different pollen types.
The resulting measurements from bioaerosol monitors are further analysed for the purpose of discriminating between bacteria, fungal spores and pollen and different AIbased strategies are tested (Crawford et al. 2015;Pan, Huang, and Chang 2012;Robinson et al. 2013;Ruske et al. 2017Ruske et al. , 2018Swanson and Alex Huffman 2020). It was shown that for the task of airborne particle classification, clustering, in general, performs slightly worse than the supervised learning methods (Ruske et al. 2017). Same authors also noted that the use of neural networks may improve the accuracy of classification. Šaulienė et al. (2019) used three different architectures of convolutional neural networks to analyse scattering and the fluorescence properties for each particle reaching the Rapid-E device. That was the first analysis of the pollen monitoring capabilities of the Rapid-E pollen monitor. They found that the Rapid-E has the potential to identify pollen types in real time, but it is necessary to improve classification algorithms to include more pollen types. Recently, Sauvageat et al. (2020) conducted research on utilising convolutional neural networks to classify data from Swisens Poleno device monitor. They applied digital holography technique on fluorescence data to reconstruct images of airborne particles. These images are further processed by the neural network and they succeeded to identify up to 10 different pollen species.
However, a good classification is just one step towards the automation of the whole process of airborne pollen monitoring. None of the results from previously mentioned studies haven't been yet implemented in an operational environment for real-time pollen monitoring. In order to develop a fully operational solution, it is necessary to incorporate all activities of that process in an integrated information system, starting from raw data preprocessing to presenting real-time measurements in a user-friendly manner.

RealForAll system
RealForAll system is an integrated system for real-time monitoring of airborne allergens and dissemination of information about their concentration.
The system has been developing since 2018 and has already monitored one full pollen season. The system currently provides pollen measurements from two locations: Novi Sad, Serbia and Osijek, Croatia. By implementation of this system, the whole process of pollen detection is successfully automated.
The software architecture of this system is presented in Figure 1. The system consists of several subsystems where each subsystem has a particular role. Role of the Rapid-E device, which is represented as a component in Figure 1, is to collect airborne particles and generate raw optical data. These data are further processed by the AI-enabled subsystem for classification in order to detect different types of particles (component Data classification). Classified data are sent to the RealForAllHub subsystem whose role is to store classified data and to transform it in an appropriate format for the end-user applications. Detailed description of these subsystems is given below.

Rapid-E
The Rapid-E device is an airborne particle monitoring station. It is designed for automatic and real-time analysis of single particles suspended in air. The device aspirates ambient air with suspended particles that interact with the laser light sources (Šauliene et al. 2019) resulting in scattered light and fluorescence that are combined for characterising each particle (Kiselev, Bonacina, and Wolf 2013). The scattered photons are captured from different angles by 24 time-resolving detectors (Kiselev 2019), resulting in an image which size depends on the particle's morphology (i.e. size and shape). Chemical characteristics of detected particles are represented by their emission spectrum and fluorescence lifetime. After excitation by the deep-UV laser (337 nm) emitted fluorescence is recorded at 32 measuring channels within a spectral range of 350-800 nm and eight sequential acquisitions/bands with 500 ns retention. In addition, the rate of decrease of the fluorescence intensity (fluorescence lifetime) after double excitation by a laser beam is recorded at four spectral bands (350-400, 420-460, 511-572, 672-800) and 2 ns temporal resolution (Kiselev and Kiseleva 2019).
Rapid-E device provides a JSON file containing scattered light, fluorescence spectrum and lifetime properties of each particle sampled in a minute. The device has a LAN connector and provides a secure shell that can be used to access the data in real-time.
The RealForAll system currently incorporates two Rapid-E devices. One is installed in Novi Sad, Serbia and the other is in Osijek, Croatia. The devices are connected to a local network of institutions hosting those devices which provides a stable connection between devices and the subsystem for classification. Both devices are operational and generated data in real-time during pollen season, from February to October 2019.

Subsystem for classification
The AI-enabled subsystem for classification continuously retrieves raw data from Rapid-E and performs the classification of pollen particles. The subsystem detects and counts particles larger than 8 microns and identifies several different pollen types. Classification is based on artificial neural networks implemented in Python using PyTorch for neural network implementation (details are given in Chapter 4).
The subsystem classifies minute measurements in real-time. Classification is performed with a latency of a few minutes to ensure that raw data have been retrieved from the device. As an output of the classification, the subsystem generates JSON documents for each measurement device and time-related sample. The document contains a device's identifier, time of measurement, and measured values for each classified pollen type. JSON documents are sent to the RealForAllHub subsystem to be stored and further processed.

RealForAllHub subsystem
This subsystem is designed to store and maintain classified data and it is implemented using Java EE technologies. Classified data are stored in PostgreSQL relational database (PostgreSQL 2019). The subsystem provides REST service for importing data into the database (ImportService component in Figure 1). This service is used by the subsystem for classification but any other institution with real-time pollen measurements can be joined easily. The only restriction imposed by technology is to provide a continuous flow of classified data in the format described by this REST service. Those measurements will be accessible through our end-user applications.
The end-user applications show hourly pollen concentrations, but the RealForAllHub subsystem receives minute measurements. This requires aggregation of measurement data and it is done on each hour but postponed by several minutes to ensure that all data for a given hour are received (Aggregation component in Figure 1). Aggregation calculates the average hourly value from minute values within the last hour. Those aggregated data are also stored in the database. There is a configuration in the system regarding how many minute measurements are expected to be received during an hour. A notification email is sent to the system admin in the case that some measurements are missing (Notification component in Figure 1).
This subsystem provides REST services to end-user applications. Mobile and web applications use the AppService component ( Figure 1) to obtain and visualise data about pollen concentrations. Also, there is the AdminService component ( Figure 1) used by the web application for system administration.

End-user applications
The main aim of the RealForAll system is to disseminate information about pollen concentration. Appropriate Android and iOS mobile applications, as well as the web application, have been developed for that purpose (Android app 2019; iOS app 2019; Web app 2019).
Mobile applications show real-time pollen measurements from available Rapid-E devices as well as hourly averages for a selected device (Figure 2). Presented measurements can be filtered by pollen types and compared to measurements from other devices. The applications also provide a forecast of pollen distribution over Europe generated by SILAM (SILAM 2019). In addition, the applications may be used to keep personal allergy symptoms diary in order to find a correlation between recorded symptoms and airborne pollen measurements. Information from this diary may be useful in the evaluation of prescribed treatments and for better management of allergic diseases. The web application for end-users has fewer features than mobile applications and it only provides measured hourly average concentrations and the forecast.
The RealForAll system also has a web application for system administration. It is not exposed publicly and only authorised users have access. The application allows the management of the RealForAll system (i.e. adding new pollen types and devices as well as configuration of some system properties). It also allows the export of minute and hourly classifications for a selected period.

Classification
The output of the Rapid-E device is a JSON file containing scattered light, fluorescence spectrum and lifetime of fluorescence signals for every particle sampled in a minute. The detailed description of the output files structure is given in the earlier pilot study (Šauliene et al. 2019). The character of the measurements (i.e involves a temporal component for scattering light and fluorescence and multiple wavelength bands for lifetime of fluorescence) allows transferring light intensity signals into two-dimensional image format suitable for analysis using Convolutional Neural Network. This section provides details regarding classification methodology used in the RealForAll system.

Data collection
Labelled data for training the classifier are obtained in calibration events, where the domain expert is exposing the Rapid-E device with collected aerosol samples in a controlled environment. Each calibration resulted in JSON files, belonging to the same aerosol class. Calibration was performed for 24 the most common pollen classes (Acer, Alnus, Ambrosia, Artemisia, Betula, Broussonetia, Carpinus, Corylus, Fraxinus excelsior, Fraxinus ornus, Juglans, Morus, Other pollen, Pinaceae, Plantago, Platanus, Poaceae, Populus, Quercus, Salix, Taxaceae, Tilia, Ulmus, Urticaceae). In addition, real-time measurements at a time when there was no pollen in the air are labelled as 'other' and 'starch' and used in training the classifier in order to prevent mixing other bioaerosols (i.e. fungal spores) and starch with pollen.

Data preprocessing
Laser-induced data tend to be noisy and using them in their raw form can result in poor generalisation. To avoid this, scattered light images are centred with respect to the time axis around the mean of indices with maximum values over 24 angle pixels and then cut or padded with zeros to fit the size of 20 × 120 (4 boundary angle pixels are removed due to device dependence). Fluorescence spectrum and lifetime signals are normalised into 0-1 range. The signals of scattered light and fluorescence spectrum are smoothed with the Savitzky-Golay filter (Savitzky and Golay 1964) to additionally reduce the noise. Fluorescence spectrum signals were converted into a 4 × 32 pixels image by stacking second to fourth acquisitions/bends. Similarly, fluorescence lifetime signals were converted into a 4 × 24 pixels image. Particle size approximation calculated from the scattered light image and lifetime weights calculated from the fluorescence lifetime signal were also added as features to the classification model.
To ensure only high-quality records are analysed, detected particles with the scattering image width larger than 450 pixels, at least one of the four maximal spectral peaks lower than 408 nm or larger than 495 nm, maximum spectral intensity less than 2500 and lifetime maximum peak not detected between 20 ns and 88 ns are filtered out.
The processed data contains 103593 pollen samples and 6285 samples from real-time measurements. The data is split into train and test datasets, where the train set contains 90% of samples from each class while the remaining events were used for testing.

Neural network architecture
The classification algorithm is based on convolutional neural networks (CNN), which have so far shown better performance on similar problems in image processing and object classification compared to other machine learning classifiers (Krizhevsky, Sutskever, and Hinton 2012). CNN allows automatic feature extraction, which is crucial when dealing with complex homogeneous data, as well as combining multiple inputs from the Rapid-E device to perform classification. The network is processing each data-type individually using the combination of 2-D convolutions, ReLU activations, batch normalisation, max pooling and dropout, considered as a convolutional block (Figure 3). By doing so, it learns the most important features and reduces the dimensionality of data provided by Rapid-E. The features are first equalised by passing each of them to one fully connected layer of the same size, so that each input has the same contribution to the feature vector, and then concatenated, together with the additional features of size and lifetime weights and are passed to the fully connected layer consisting of 26 nodes since there are 26 aerosol classes for identification, after which the classification is performed with the log-softmax activation function. In this way, the network architecture allows the gradient to flow through the whole network, updating the weights for each distinct source, based on the joint decision. The cost function used for training the network is negative log-likelihood loss and the updater is the stochastic gradient descent with a learning rate of 0.001 and a momentum of 0.9. We created batches used for training the model in such a way that each batch contains the same number of samples from each class and thus resolved the unbalanced dataset problem. The detailed description of the network is given in Šauliene et al. (2019).

Classification results
On the test dataset, the model yields an accuracy of 65.3% on 26 classes (Figure 4). The precision, recall and F1 score of the model are 59%, 69% and 61%, respectively. The number of classes involved in the test is rather high making the task unrealistic for real-life monitoring. Therefore, the real performance is evaluated by comparison to standard monitoring of airborne pollen (EN 16868:2019) performed in Novi Sad in 2019. In order to neutralise losses from data preprocessing, RealForAll data were multiplied by a scaling factor (SF) corresponding to the relationship between quantity measured by standard method EN 16868:2019 and quantity obtained by RealForAll system. The performance was tested by analysing Pearson correlation coefficients (R) between average daily pollen concentrations measured by two systems while focusing on the periods when standard method detects pollen of interest. Good performance (R > 0.7) was confirmed for 11 pollen types ( Figure 5) while the rest classifications underperformed (Figure 6). Pearson's correlation coefficients (R) and scaling factor (SF) are given on both figures. For both good and underperforming classifications there is a notable amount of false-positive detections that are eliminated by manual limitation of the pollen season. Apart from the further improvement of the classification model by introducing the shortcut connections between network layers (Kaiming et al. 2016) and increasing the width of these networks (Zagoruyko and Komodakis 2016) for better feature extraction, the next developments will strive to automate the limitation of the season by introducing confidence thresholds for the classifications under which the model will not deliver data to the RealForAllHub subsystem. Future development of the classification should involve additional separation systems for the mixing classes since some classes are very well separated while some are not (Figure 4). It should be noted that for nearly all underperformed classifications, signals are characterised by low intensity. This is also characteristic for the standard method EN 16868:2019, but this is even more augmented in automatic classification by strict filtering which ensures that only high-quality detections are analysed, but it decreases the detection limit of the method.

Discussion
By implementing the RealForAll system in a production environment, we succeeded in the automation of the pollen detection process and the dissemination of real-time measurements. Currently, our mobile applications are installed on more than 1700 mobile devices. The main advantages of this system in comparison to the standard method EN 16868:2019 reflect in its extensibility, real-time data delivery and independence of aerobiology experts during the pollen season.
RealForAll system can, with relative ease, be adapted to a wider user base. Namely, introducing new Rapid-E devices in the RealForAll system would not have a significant impact on overall system performance. In the case of the standard method EN 16868:2019, adding new Hirst devices require additional manual effort directly influencing operational time and cost. Also, it enables interoperability with other systems for pollen detection. They can easily send and store their measurements to the RealForAll system with the advantage that their data will be efficiently disseminated through RealForAll end-user applications.
The irreplaceable step in the standard method EN 16868:2019 is the manual classification of data performed by aerobiology experts. Operation of the RealForAll system does not require this kind of expertise because the process of classification is automated by the AI module.
Finally, the significant difference between those two approaches is the time needed to provide relevant pollen measurements. Table 1 shows the approximate duration of activities carried out in order to get hourly pollen measurements using the standard method and the RealForAll system, respectively. The Rapid-E device is sampling in realtime so the sample characteristics are measured for every minute and the hourly sample is  available already after 60 minutes of measurements. In the case of the Hirst device, it is not economical to process samples every hour and because of that sample is usually available after either 24 hours sampling or more common after 1 week. A 24-hour sample obtained from the Hirst device requires a minimum 2 hours for preprocessing and classification while in the case of the RealForAll system that activity is done in 15 minutes. Taking everything into consideration, we can conclude that the RealForAll system can disseminate pollen information at least 20 times faster than the standard method. Although the RealForAll system shows great results in performance and operability, there is still some room for improvement. Analysing values from Table 1, it can be seen that the RealForAll system has latency in data provision but it is not due to the longlasting classification process as it may seem. This delay is a consequence of batch data processing. The subsystem for classification downloads data from the Rapid-E device at periodic intervals. The classification process is not aware of whether the download is complete, and because of that, it is postponed for 10 minutes to provide enough time for finishing the download process. However, this does not guarantee that all data will be processed. Size of Rapid-E minute recordings can vary from 10 to 100Mb and those files may not be transferred in 10 minutes in the case of poor Internet connection. This problem can be solved by implementing streaming data processing. In that manner, we would have a continuous flow of raw data from Rapid-E devices and data will be immediately classified as they arrive without any delay. However, the latency of several minutes is still inevitable to ensure that the RealForAllHub subsystem has received most of the classified data before it performs aggregation.

Conclusion
In this paper, we introduce an integrated system for real-time monitoring of airborne allergens and the dissemination of information about their concentration. This system automatises the standard method EN 16868:2019 for pollen detection. The system provides hourly measurements with a latency of 15 minutes which is a significant improvement in comparison with the standard method. Also, the system provides easy integration of new devices as well as pollen measurements from other systems, which brings an advantage to application users who get a single point of access to real-time measurements from different locations.
Biological contaminants pose severe threats to the manufacturing processes of numerous industrial, food and pharmaceutical products. Additionally, microorganisms such as fungi, bacteria, and viruses can cause significant damage to workers' health and plant health in agriculture. The introduction of automatic bioaerosols monitoring in industrial enterprises is expected to minimise negative occupational health effects and maximise profit. The discrimination of bioaerosols is often a prerequisite for the successful implementation of mitigation measures. For example, for successful allergy management, it is not sufficient to know total pollen concentrations but the quantity of each allergen in the atmosphere so sensitive individuals could be selectively warned. Similarly, the presence of only specific fungal spores should be a trigger for fungicide spraying in glass houses indicating that discrimination of bioaerosols has more value than information on their bulk quantity in production enterprises.
The RealForAll system proved that AI enables automation for monitoring of airborne allergens which is part of routine environmental monitoring in about 700 stations worldwide (Map 2019). This opens possibilities for the application of aerobiology in a variety of industries in particular relation to human, animal and plant health. Despite the fact that further improvement of classification models is needed to enable identification full spectra of bioaerosols suspended in the atmosphere, the RealForAll system is an example of how automation of a tedious manual process that requires a notable amount of domain expertise (i.e. identification of pollen) is supported by advanced laser-induced fluorescence measurements and AI. Systems for real-time pollen identification are still in their ongoing phase of development and a lot of effort should be made especially regarding classification accuracy. Opening raw data from pollen monitors worldwide as well as making classification models publicly available to other researchers would be very beneficial in order to get scientific feedback and speed up further research in this field.
To sum up, comparing to the Hirst method, the main drawback of implementing RealForAll system reflects in its initial investment cost but on the other side, it provides real-time pollen measurements which help allergic people to better manage their allergic disease and are essential for the improvement of forecasting models (Sofiev 2019).

Disclosure statement
No potential conflict of interest was reported by the authors.