Evaluation of low-cost sensors for quantitative personal exposure monitoring

Challenges associated with the robustness of low-cost sensors are studied. Field experiments are performed to analyse sensor performance in diverse conditions. Pre/post deployment colocation experiments are performed for LCS and reference monitors. We investigated four calibration methods for SC kits: LR, ANN, SVR and RF. SVR model outperformed others models with an average RMSE of 3.39 for PM 2.5 and 4.10 for PM 10 . Abstract Observation of air pollution at high spatio-temporal resolution has become easy with the emergence of low-cost sensors (LCS). LCS provide new opportunities to enhance existing air quality monitoring frameworks but there are always questions asked about the data accuracy and quality. In this study, we assess the performance of LCS against industry-grade instruments. We use linear regression (LR), artificial neural networks (ANN), support vector regression (SVR) and random forest (RF) regression for development of calibration models for LCS, which were Smart Citizen (SC) kits developed in iSCAPE project. Initially, outdoor colocation experiments are conducted where ten SC kits are collocated with GRIMM, which is an industry-grade instrument. Quality check on the LCS data is performed and the data is used to develop calibration models. Model evaluation is done by testing them on 9 SC kits. We observed that the SVR model outperformed other three models for PM 2.5 with an average root mean square error of 3.39 and average R 2 of 0.87 . Model validation is performed by testing it for PM 10 and SVR model shows similar results. The results indicate that SVR can be considered as a promising approach for LCS calibration.


Introduction
Deterioration of air quality is an important challenge in most urban parts of the world (Kumar et al. 2015(Kumar et al. , 2016. Pollutants such as particulate matter (PM), carbon monoxide (CO), nitrogen dioxide (NO2) can cause respiratory as well as cardiovascular diseases (Cascio and Long 2018;Kumar et al. 2015). It is not just adversely affecting human well-being, health and productivity but also overall development and sustainability. Environmental damages are also enormous and cities require intensive monitoring (Morawska et al. 2019) to understand the trends and sources responsible for particulate matter concentration (Hama et al. 2020;Shukla Accepted for publication on 27.01.2020 in 'Sustainable Cities & Society' journal. Please see journal website for full citation details. et al., 2020). Raising awareness among the people is of utmost importance to efficiently monitor the changes in the air quality and assess the harmful impact of air pollution on human health and sustainability of cities (Ortolani and Vitale 2016).
Traditionally, government agencies are considered as the primary participants involved in air quality monitoring. Their sole purpose is to perform a regular inspection for air quality compliance and informing policy-making. There are limited monitoring sites as the industrygrade instruments are expensive and require regular maintenance (Rai et al. 2017;Kumar et al. 2015). These limited monitoring sites also limit the spatial resolution of the data (Morawska et al. 2018). It has been observed that pollutant concentration can show complex spatial shortterm variations (Monn 2002). For example, depending on the meteorological conditions, different parts of a city are affected differently by emission sources (Mangia et al. 2013). In the case of streets, the concentration can vary within a short distance within minutes (Goel and Kumar 2015). This makes it necessary to have an efficient network of sensors that can generate large data-sets which can improve spatio-temporal resolution (Mahajan et al. 2018) of air quality data. This can lead to knowledge extraction from the data which can be further used by public to take precautionary measures.
One of the driving force behind efficient air quality monitoring with the finer spatio-temporal resolution is the availability of low-cost sensors (LCS) for large scale data sensing (Chen et al. 2018;Commodore et al. 2017;Kamel Boulos et al. 2011). These include portable sensors that are cost-effective as well as reliable to capture pollution peaks and reproduce the data. The use of such sensors can help in increasing the spatial density of air pollution monitoring which can lead to more information and services (Castell et al. 2015 Rai et al., 2017) but also provides an alternative to design cost-effective air quality monitoring frameworks ) that can be easily deployed in different parts of the world.
Compared to the industry-grade instruments, these LCS are a convenient alternative for static as well as mobile sensing (Rai et al. 2017;Spinelle et al. 2017). This makes LCS easy to use and deploy in regions with limited monitoring facilities.
The downsides of using LCS for large scale deployment is the less accurate data generated by them (Yi et al. 2015 (Wei et al. 2018). If the relative humidity exceeds 75%, the error rate rises significantly (Masson, Piedrahita, and Hannigan 2015).
Although calibration related research has been going on for many years, it still attracts a lot of interest due to the following reasons: (i) availability of new LCS that are cost-effective; and (ii) air quality sensing frameworks and applications using crowdsourcing and crowd-sharing for personal exposure monitoring (Maag, Zhou, and Thiele 2018b). reference instruments pre-and post-deployment and performance of calibration algorithms when environmental conditions are different from the training set. We attempt to answer these questions by performing extensive colocation and using the dataset to assess the performance of various calibration algorithms. The idea is to develop a model that is not just accurate and efficient but can potentially be used .

Modelling approach
As described in Table 1, most of the existing methods either use linear regression (LR) methods or advanced neural network methods. The drawback of using LR methods is that it can only capture the linear relationship. On the other hand, artificial neural network (ANN) models do capture the non-linear relationship but the overall computation load in terms of memory and time is higher. Random Forest (RF) models are found to be a good option but to have an accurate and efficient model, large amount of data is needed. In this study, we aim to improve the calibration models for LCS using Support Vector Regression (SVR) method, whose performance is also compared with other statistical and machine learning algorithms.
We performed a pre-deployment field co-location experiment using ten low-cost air quality sensors collocated with a reference instrument (Section 2.3). To develop an efficient and accurate calibration model, the model was trained and tested for ten LCS with each measuring PM1, PM2.5 and PM10. Furthermore, after the initial co-location experiment, the sensors were deployed for Citizen Science experiments where they were used for hundreds of hours (Mahajan et al., 2020). Once the deployment during citizen science experiments was over, we carried out another post-deployment co-location experiment with a reference instrument to ensure the quality and accuracy of the LCS (Section 3.2). The calibration model is discussed in detail (Section 2.4) to elaborate on the ideal parameters for the model as well as the size of data-sets to improve the performance and lower over-training of the model. A comparative

Instruments
During the study period, outdoor air pollution concentrations in a typical UK town, Guildford, were recorded using Smart Citizen (SC) kit and GRIMM EDM 107 (Supplementary Information, SI, Figure S1). The SC kit uses a data board and an urban sensor board (Camprodon et al. 2019

Quality assurance of citizen sensors
An important step before performing any kind of large-scale deployment study using LCS is to first check sensors performance when compared with reliable industry-grade instruments. Before distributing the sensors for citizen science studies, we performed a field study to validate the performance of LCS by performing a colocation experiment with the industry-grade instrument at the same site. For this task, ten SC kits and a GRIMM (industry grade instrument) were used. The field experiments were designed to identify errors that could occur when these sensors are deployed for monitoring PM1, PM2.5 and PM10 in real-world conditions. The sampling time for GRIMM and SC kits was 6 seconds and 30 seconds, respectively. To maintain consistency in the experiment, the data from GRIMM was averaged for 30 seconds. Measurements were made for almost 30 hours from 29 October 2018 to 4 November 2018 at an iSCAPE project site in Guildford, UK (Abhijith and Kumar 2019). Some Accepted for publication on 27.01.2020 in 'Sustainable Cities & Society' journal. Please see journal website for full citation details.
of the SC kits malfunctioned due to sudden battery discharge and anomalous data. After the data pre-processing, five hours of continuous data was used to perform the comparative analysis between the GRIMM and SC kits. Although the data for comparison was limited, it captured concentration trends for peak and off-peak period, which provided valuable information and a strong metric for comparison. During the data collection, peak hours and offpeak hours concentrations were recorded. Once the citizen science activities were concluded (Mahajan et al., 2020), post-deployment colocation experiment was performed to further validate the performance of the SC kits. The second colocation campaign was conducted from 27 March 2019 to 1 April 2019. After cleaning the data, 15 hours of data was retrieved to perform the analysis.

Measurement site and instrument setup
The measurement site (Latitude 51.246495, Longitude -0.571539) was located close to a circular intersection and a traffic signal. Two co-location campaigns were organised at this site. For the first co-location campaign, we used the new SC kits. The instruments, including ten SC kits and a GRIMM were mounted on a tripod stand and were placed at a typical breathing height of 1.5m ( Figure 1). The instruments were placed close to the roadside with a regular inflow of vehicles. The dominant local emission source at the site is vehicle emissions with occasional truck traffic.

Calibration models
R software is used for the statistical analysis (Supplementary Information, SI, Section S1 for Source Code). Initially, the data were pre-processed using data analysis tools (Mahajan and Kumar 2019). This is an important step to verify the reliability of the data by removing outliers and missing values. Once the data is cleaned, the dataset is divided into training and testing data. In this study, we used 80% of the data for training the models and 20% of the data Accepted for publication on 27.01.2020 in 'Sustainable Cities & Society' journal. Please see journal website for full citation details.
for testing the models. The data split is based on the Pareto rule which is also known as 80/20 rule. The Pareto principle (Lipovetsky 2009) states that "for many events, roughly 80% of the effects come from 20% of the causes." The splitting results showed consistent results with high accuracy for both training and testing data for all models. We used a dynamic data splitting approach, which would split the data in a random way. This approach proved more useful while testing the robustness of the models with no regular pattern in the data. In fixed splitting approach, patterns can be recorded and the models will usually have a higher accuracy. We intended to test the models in a scenario where the training data and testing data can be varied in every iteration so that the model performance can be properly evaluated.
In the next step, we focused on developing an efficient PM2.5 calibration model for the SC kits. Linear Regression (LR): LR is a statistical method that considers many variables to predict the outcome of a particular variable. This method uses the value of (independent variable) to predict the value of a dependent variable . For a simple linear regression, should have a linear relationship with . The way LR works is that it finds the straight line which is also known as the least square regression line, that shows the observations in the dataset. If is the independent variable and is the dependent variable, then the regression line is given by Eq. (1): Accepted for publication on 27.01.2020 in 'Sustainable Cities & Society' journal. Please see journal website for full citation details.
In the above equation, 0 represents a constant value, 1 is the regression coefficient, is the independent variable, and is the dependent variable. Such methods have been widely used for understanding the relationships between environmental pollutants also for forecasting a non-linear operation is followed to predict the output (Yusaf, Yousif, and Elawad 2011). In mathematical terms, a neuron can be described using Eqs (2) and (3) (Kubat 1999): denotes the bias that can alter the input of activation function. 1 , 2 , … . , denote the input whereas 1 , 2 , … . , denote the neuron ( ) weights. is the combined output, ∅ is the activation function and denotes the neuron's output. A very common learning algorithm in ANN is the backpropagation algorithm. In the backpropagation method, the data is processed from the input layer to the hidden layers and finally reaches the output layer. For this study, we used a backpropagation neural network model with two hidden layers consisting of eight nodes. Multiple combinations were tested before selecting the best combination for the ANN model.

Support Vector Regression (SVR)
Model: SVR has been found useful while dealing with nonlinear problems (Balabin and Lomakina 2011). The idea behind using SVR is to identify a function ( ) that shows a relationship between the predictors and the target (Cherkassky 1997). Let us assume that the training data we have is 1 1 , … . , where is the ℎ sample of predictors and is the target variable. The total number of samples is given by . If the training data has features, then we can say is the dimensionality of . Let us say the constructed SVR model is given by Eq. (4): is the kernel function, includes the variables obtained through optimization. While implementing the SVR algorithm, it is important that an appropriate kernel function is used. During the regression process, a tree is constructed by repetitive partitioning of the original sample into multiple nodes which go up to the terminal node (Grömping 2009). The split is based on the variable value and depends on the splitting criterion. Once a tree has been constructed, the result for any observation can be forecasted by following the path from root node to terminal node. For this study, initially we tested the model with different number of trees starting from a default value of 500. Error rates were considered for different number of trees and based on the stabilization of error rate, the final model was implemented using the number of trees with lowest error value.

Metrics for performance evaluation
Model performances were analysed using three metrics: coefficient of determination (R 2 ), root mean square error (RMSE) and covariance. R 2 value provides an estimate of the variability in the dataset accounted by different models. When R 2 is 1, it shows a perfect scenario with 100% correlation whereas a value of 0 means no correlation. The RMSE value showed the difference between the predicted values and the observed values. It has been widely used as a measure of accuracy (Shirsath and Singh 2010). RMSE is given by Eq. (5): is the observed PM2.5 value, is the predicted PM2.5 value and is the total number of observations. Covariance shows the amount by which two variables are linearly associated. Positive covariance values show a positive correlation between the variables and vice versa. To get a better understanding of how the SC kits performed in comparison to the GRIMM, we plotted a correlation matrix between the observed values. Figure 2a shows a good correlation between the GRIMM measurements and ten SC kits for PM2.5 measurements; with R 2 value ranging between 0.76 and 0.82. For PM10, the results showed a relatively low R 2 ranging between 0.51 and 0.56 (Figure 2b). Interestingly, the correlation was >0.9 for both PM2.5 and PM10 among the SC kits.

Variations in data and data correlation
During the post-deployment colocation experiment after the citizen science workshops, a similar trend was observed between the SC kits and GRIMM measurements. For PM2.5, the correlation was high ranging between 0.88 and 0.90 (Figure 3a). On the other hand, the Accepted for publication on 27.01.2020 in 'Sustainable Cities & Society' journal. Please see journal website for full citation details.
correlation for PM10 was similar in the range of 0.52 to 0.53 to what was observed during the pre-deployment correlation experiment (Figure 3b).
Pre-deployment and post-deployment correlation experiments proved to be very useful for evaluating the SC kits performance. The correlation results clearly indicated that the SC kits performed better for PM2.5 compared with PM10 and that the correlations remain similar during pre and post-deployment correlation experiments.  Table 3 (Figure 4). In fact, the performance of the LR model was almost the same for all the SC kits, clearly demonstrating that this model is unable to capture the concentration variations.

Performance evaluation of ANN model
In the next step, we implemented the ANN model on the dataset. The ANN model implemented a backpropagation model with two hidden layers, as described in Figure S2.
Predictive performance of the model was initially tested on the training data. Average RMSE values for different sensors can be observed in Figure 5. The RMSE is low for most of the SC kits.
Accepted for publication on 27.01.2020 in 'Sustainable Cities & Society' journal. Please see journal website for full citation details.
It is generally observed for training data that the RMSE values are lower. This is because the training data is almost four times the testing data. And over time, the statistical learning process tries to find the patterns in the data and lowers the RMSE values to the smallest amount. The important part is to see how the model performs on the testing dataset. We analysed the results of ANN model and found that the performance was better than the LR model. The R 2 values and the RMSE for all SC kits can be observed in Table 5. all SC kits as shown in Figure 6. This analysis allowed concluding that the ANN model performed well and is able to catch the variations in the data for most of the cases.

Performance evaluation of SVR model
As a further step, we analysed the results obtained by the SVR model. Table 6 shows the values of R 2 , RMSE and covariance after the implementation of the SVR model on the testing dataset. Table 6 shows that R 2 values for all the SC kits are over 0.80 and also the RMSE values observed are lower. Some of them are significantly lower than the other three models with the lowest one going up to 1.16. Even the covariance values are higher showing a high and positive correlation between observed and predicted PM2.5 values. This shows that the SVR model performs better than the other three models. To further verify the model performance, the observed and predicted values are plotted for all SC kits as shown in Figure 7. It can be observed that for all the kits, the fitting between the observed and the predicted values is very close (Figure 7).

Performance evaluation of RF model
Accepted for publication on 27.01.2020 in 'Sustainable Cities & Society' journal. Please see journal website for full citation details.
In the final step, we evaluated the performance of the RF model. Table 7 shows the R 2 , RMSE and covariance values obtained after the implementation of the RF model on the testing dataset. A positive correlation is found between the observed and predicted values, but the RMSE is higher for most of the cases. To further analyse the model performance, the observed and predicted values are plotted for all SC kits as shown in Figure 8. It can be observed that there are peaks that are not captured by the RF model.
After testing all four models, the SVR model performed the best followed by the ANN model and then the LR model. To understand the overall performance of the developed models, we compared the average RMSE values obtained from testing the models on nine SC kits ( Figure   9). The SVR model outperforms the other three models with average RMSE value of 3.39 as compared to 5.56 for ANN model and 7.43 for LR model and 7.70 for RF model (Figure 9a).
This gives a better understanding that for development of calibration models for LCS, SVR model can be efficiently used for calibration. To validate the performance of SVR model, we compared its performance with other three models for PM10. It was observed that for all SC kits, SVR model performed the best with low RMSE and high covariance (Table S1). The average observed RMSE using the SVR model was 4.10 as compared to 21.80 (LR), 11.80 (ANN) and 13.20 (RF) as shown in Figure 9b. This observation made it clear that the SVR model could be effectively implemented for different fractions of particulate matter.

Summary, conclusions and future work
Air pollution and its consequencies have affected majority of the countries in the world. data reliability of LCSs as well as measures to improve the data sensing capabilities of these sensors by implementing efficient calibration algorithms. We used GRIMM as an industrygrade instrument and SC kits as LCS. Before developing the calibration models, we performed the quality assurance of the LCS by performing two outdoor colocation campaigns. The first one was the pre-deployment colocation that was aimed at checking the quality of SC kits before distributing the sensors to the citizens during citizen science workshop. We let the citizens use the sensors for three weeks and then performed post-deployment colocation experiment to verify and check if the sensors were still working efficiently. Once the colocation campaigns were finished, the data pre-processing was performed to check the data quality.
In the past, there have been several works that have studied regression techniques using LR (Spinelle et al. 2014), ANN (Spinelle et al. 2014) and RF (Zimmerman et al. 2018)