Machine Learning based Soft Sensing Tool for the Prediction of Leaf Wetness Duration in Precision Agriculture

. Leaf wetness often emerges as the result of the exchange of atmospheric water-soluble gases between the Earth surface and the atmosphere. The importance of this feature resides in the relationship that exists between leaf wetness and various plant diseases. In order to measure this variable, there is a need for deploying physical sensors to capture wetness readings of a crop area. However, the installation and maintenance of these sensors is a hard task that involves qualiﬁed people, time and high economical costs. Moreover, the acquisition, storage and analysis of data must be taken into consideration to infer this information and issue countermeasures preemptively. This work presents a leaf wetness soft-sensing approach that relies on predictive machine learning models to estimate the wetness of the leaves of a speciﬁc crop. Speciﬁcally, among the learning algorithms that are evaluated for this purpose, we include Random Vector Functional Link (RVFL) networks, a family of neural networks that embrace randomization at their core to yield a highly eﬃcient training process. By virtue of machine learning, physical sensors can be replaced by soft-sensors capable of providing the information related to the wetness of the leaves of the crop. In this way, human eﬀort and costs are largely reduced, while ensuring a high precision of the wetness estimation as proven by experiments with real-world data.


Introduction
In short, leaf wetness can be defined as the existence of free water on the surface of a crop canopy due to precipitation, dew, fog, or irrigation [1]. Leaf wetness is a significant issue in precision agriculture, since it directly relates to plant diseases, insect activity, the harvest and curing of crops. Given the moisture balance of arid and semi-arid regions of the world, the measurement of leaf wetness has been a subject of intense study throughout history. As a matter of fact, the first connection between one kind of plant infection and leaf wetness was due to DeBary in 1853, who unveiled the link between the infection of potatoes by Phytophthora Infestans and the presence of free water on the plant canopy [1]. Since then, a variety of bacterial, fungal and other diseases have been associated to wetness under thermal conditions that are favorable to plant infection [2].
In this context, the lack of a standard method for calculating and measuring the Leaf Wetness Duration (LWD) is arguably the most concerning constraint for achieving rigorous data [3]. This causes many practical difficulties when comparing results, and may lead to bad decisions made by growers about the diseases related to the leaf wetness. As opposed to other variables (e.g., rainfall, temperature, or relative humidity), LWD is not a standardized measure. Thus, in order to properly assess the inherent value of the approach proposed in this work, it is convenient to briefly overview some devices that are available today for this purpose. According to [1], the evolution of LWD sensors has evolved from passive (static) to mechanical and more recently, to electronic devices. However, regardless of the sensors used for the measurement, certain problems associated to them must be considered, such as their robustness against corrosion, the noise induced in the captured signal, the contact area (in case of using clip sensors), or the height and angle of the sensor (which can cause turbulence during the drying process). All these issues can cause inaccuracies in the collection of measurement data [4]. Furthermore, it is of utmost importance to underscore the need for moving the sensors from time to time, or for removing them at harvest time (depending on the crop), not to mention the cost in time resources (human and money) that the maintenance of those sensors require.
Consequently, given the importance of the LWD and the numerous problems undergone by devices measuring this variable, in this manuscript we present a novel soft sensing tool that, after a data collection period using physical sensors, resorts to Machine Learning (ML) techniques that, once trained with the collected data and together with meteorological information, can predict the LWD value without any further need for physical sensors. Several ML variants have been considered to realize the data modeling part of our proposed tool, among which we include several avant-garde variants of Random Vector Functional Link (RVFL) networks [5]. RVFL-based models are characterized by highly-efficient training processes by virtue of the random initialization of part of their trainable parameters, achieving a good balance between the precision of their outputs and the computational cost demanded by their training process. To the best of our knowledge, these RVFL-based models (including multi-layer, deep and ensemble deep approaches) have not been considered by the related literature to date, constituting by itself another novel ingredient of this work. Experiments comprising real-world data will be presented and discussed on the basis of several indicators, evincing that RVFL variants perform competitively with respect to other approaches, yet at a dramatically reduced computational cost.
The rest of this paper is structured as follows: Section 2 expands preliminary concepts and related work in the existing literature. Section 3 provides a detailed description of the collected dataset and overviews the basic concepts of each algorithm considered for comparison, with emphasis on RVFL networks. Section 5 presents and discusses the results obtained from the experiments run over the aforementioned data, and Section 6 concludes this work by outlining future research directions stimulated by our findings.

Preliminaries and Related Work
As already introduced in Section 1, soft sensing allows estimating variables difficult to be measured, using available data or other variables correlated to the one under target [6]. In this case, ML techniques can be used to model this correlation in order to develop a fully autonomous application for leaf moisture estimation based on the use of low-cost sensors. The advantages of these methods are: 1) they do not require specific knowledge of the parametric equations that govern the relationships in the problem to be approached; 2) they offer high inference capacity in a highly nonlinear multi-parametric context; and 3) they provide, with a relatively low design cost, high generalization capacity, among other advantages. By contrast, its main requirement is the availability of data. For this reason, after an initial phase in which physical sensors are temporally deployed on leaves to collect moisture data, ML models are learned from this training data, so that they allow issuing estimates without the need for the sensor itself. In this work, the input data to the learning algorithm is: 1. Meteorological data, supplied by a network of meteorological stations. 2. Leaf wetness data, provided by the initially deployed set of physical sensors.
Once the models are trained, the set of leaf humidity sensors is removed, giving rise to a tool that predicts the degree of humidity in the crop depending solely on the data received by the network of meteorological stations. We stress on our claim that this methodology is cost-effective, simple, easy to implement in any agricultural operation, and does not require expert knowledge for its use. It is hence a key part in the digital transformation processes of the problem under study, since it enables a monitoring system based on Internet of Things (IoT) core technologies: it uses communications to send the data to a centralized storage system; it embraces ML to infer correlations subject to the specificity of the location under study; and they can be easily utilized in other locations, with potential to either get advantage of the knowledge already captured by the model or specialize it for the new monitored area, should more data be available by virtue of an initial temporal deployment of IoT sensors.
We delve now into the relationship between LWD and crop diseases, which has grasped a great interest in the scientific community. Indeed, a plethora of methods have been reported for the prediction [7], modeling and measurement of the LWD variable, which we now review in detail. To begin with, the work in [8] proposes the use of a physical explicit equation to model LWD, obtaining good results but acknowledging the difficulty of achieving the data necessary for solving the equation. A comparison between different methods to predict and simulate LWD (including artificial neural networks, regression trees, sensors, logistic regression and physical models) was given in [9]. Shortly thereafter, [10] developed a generalized regression neural network to estimate LWD. Interestingly, two main conclusions were drawn from this study: 1) the relation between LWD and meteorological features are non-linear, and 2) the required time for training their modeling proposal was large given the computational resources available at the time. The research presented in [11] inspired by these prior conclusions to propose an improved estimate of LWD, applying corrections to the features and using classification tree models for the sake of efficiency in the modeling stage. There are also other approaches that attempt to define surface wetness in physical terms, as well as reviewing methods for determining surface wetness considering solutions for the lack of surface wetness standardization from both measurement and simulation perspectives [12]. Interestingly, [13] recommends the inclusion of the grass temperature as a new predictor for the determination of LWD, comparing grass temperature to a measured dew point, with timely alerts. Fuzzy logic has been also used to estimate LWD [14]. A different approach recently contributed in [15], resorts to temperature and humidity on a daily time basis to simulate LWD, without taking into account other hourly meteorological features, since on many occasions they are unavailable or incomplete. Also recently, [16] developed a LWD model using Extreme Learning Machines (ELMs), a family of randomized neural networks that is in controversy for its strong structural resemblance to RVFL networks. Finally, [17] changes the formulation of the LWD characterization problem to a classification task for hourly leaf wetness, tackling it by means of a Random Forest (RF), Support Vector Machine (SVM), and again, a ELM.
The activity in LWD characterization reviewed above suggests that this field is eager for new modeling approaches. This is the main motivation for conducting this work, which incorporates the novel perspective of predicting LWD by using an economical, simple, and computationally efficient method that can be implemented in any agricultural crop. A further step is taken beyond the last reported advances by including new RVFL-based methods in the benchmark designed to determine which ML choices perform best for this regression problem.

Proposed approach
As anticipated in the previous section, the proposed LWD prediction tool requires historical wetness data and meteorological variables for performing the prediction. In particular, the set of daily meteorological data used as input are the air temperature (Celsius), the average relative humidity (%), dew point (Celsius), global solar radiation (W/m2), wind flow speed (m/s), precipitation (mm), and daily evapotranspiration (mm). Once raw data is registered and stored, several cleaning and data preparation stages are necessary to create a robust data structure that can be fed to the learning algorithm of the ML model to learn the correspondence between the aforementioned variable and the LWD variable.
Similarly to other works in the area, in this work LWD estimation is formulated as a supervised learning problem. Under this formulation, over the last years several models have been considered, for which parametric adjustment is required via cross validation. Interestingly, most contributions within this related literature focus on proposing new models. In this work we join this trend by evaluating different RVFL-based modeling flavors that have been proposed very recently to overcome the costly gradient backpropagation process that underlies the traditional neural training approximation. To this end, RVFL networks draw at random (as per a given probability density function) part of their trainable parameters, whereas the remaining ones are adjusted based on the learning task to be solved (classification or regression). RVFL, along with other ML approaches featuring randomization in their training phases (e.g., bagging and boosting ensembles or reservoir computing), constitute the wide family of randomization-based ML methods, which have been at the heart of several recent reports showcasing their performance and efficiency in diverse applications [18,19].   Figure 1 illustrates a basic single-layer RVFL network. It is a feed-forward neural structure wherein the connections between the input and hidden layer (highlighted in red) are initialized at random. By contrast, neural connections between the hidden and output layer, as well as direct input-output connections (both in blue), need to be trained. In the case of regression problems, this is done by solving a quadratic regularized minimization problem given by: where [·; ·] stands for matrix concatenation, w is the vector of output weights, || · || 2 denotes squared L 2 norm, X is the matrix of input examples to the model, H is the matrix of hidden features, and Y the matrix containing the supervision of the aforementioned examples. Different explicit solving procedures can be used for the above problem, depending on whether the problem is regularized (λ = 0, e.g., Ridge regression) or not (λ = 0, Moore-Penrose matrix pseudoinversion). Departing from this seminal single-layered structure of a RVFL network, several variants have been proposed over the years. Among them we pause at the recent work in [5], where multi-layered, deep and ensemble deep variants of these models have been presented. The nested plots in Figures 1.b to 1.d depict schematically how these variants relate to their single-layer counterpart. Multilayer RVFL (herein denoted as mRVFL) is essentially composed by a stacked arrangement of hidden layers, from which only the output of the last hidden layer is considered for its concatenation with the input matrix X and the computation of the output weights as per (1)

Experimental Setup
Several computer experiments with real data have been designed in order to 1) evaluate the performance of different ML models for estimating LWD from meteorological data, and more specifically, 2) to ascertain whether RVFL models provide a competitive advantage with respect to other modeling methods, both in terms of the quality of the estimations and the complexity of their training process. For this purpose, data has been collected within the European AFar-Cloud project (http://www.afarcloud.eu/), whose general objective is to provide a distributed platform for autonomous farming that allows for the integration and cooperation of agriculture cyber-physical systems in real-time. The ultimate goal of the project is to increase efficiency, productivity, animal health, food quality and reduce farm labor costs. Specifically, meteorological data, and especially information on the LWD variable, have been provided by the Cortes de Cima vineyard (https://cortesdecima.com). The datasets retrieved in the project spanned a total of 6 years.
Once data preprocessing is completed, a validation methodology must be designed to prevent results from being affected by the intrinsic seasonal component of the input meteorological data. To achieve this, 5 different training sets have been created, each composed by 5 of the 6 years of available data. Data of the remaining year is used to validate the models, so that performance statistics are reported over these 6 leave-one-year-out partitions. Consequently, the potential effect of the intra-year meteorological variability on the performance of the mod-els is minimized, and comparison fairness is ensured by feeding the same subsets of data to each model.
When it comes to the models used for the comparison, we adopt a wide spectrum of different choices, aiming to guarantee the impartiality of the benchmark with respect to the inner mechanisms of their learning algorithms and their generalization capabilities. On one hand, ensemble learning is represented by several algorithms that rely on bootstrap averaging and boosting: RF, Gradient Boosting Regressor (GBR), XGBoost regressor (XGBR) and Adaboost regressor (ABR). While core differences exist among these ensembles, due to space constraints we refer to [20] for a comprehensive explanation of such differences and a detailed description of their learning algorithms. We also include in the benchmark ELM models and their multi-layered variant (mELM), which permit us to verify if the controversial differences of these models to their RVFL counterparts (absence of direct input-output connection and bias terms) provide any gain in our application scenario. Likewise, several linear regressors (linear regression -LR, Lasso regression, ridge regression and elastic nets) are considered to shed light on the non-linear modeling capability that has been identified as a requirement for LWD characterization [10]. Finally, other classical algorithms such as K nearest neighbors (KNN), Gaussian Process (GPR) and SVM regressors (SVR) round up our model comparison. All hyper-parameters of these models have been tuned via a nested cross-validation strategy and grid search (results not shown to comply with paper length restrictions).
Regarding the quantitative metrics in use for comparing the above learning algorithms, we resort to several definitions widely used for regression problems: 1) Root Mean Square Error (RMSE), which measures the average quadratic magnitude of the prediction error; 2) Mean Absolute Error (MAE), which is given by the average amplitude of the prediction errors in a given set of predictions, without considering the direction in which errors are made; and 3) coefficient of determination R 2 , which is a normalized statistical measure that gauges the proportion of the variance of the target variable (in our case, LWD) that can be explained by the inputs of a regression model (i.e. the input meteorological data). Time statistics are also reported in the form of average training times in seconds per data partition, as a quantitative indication of the computational efficiency differences among the models. Finally, source code and result logs are available at https://git.code.tecnalia.com/afarcloud/dss algorithms/leaves wetness. 3

Results and Discussion
We begin our analysis of the obtained experimental outcomes by commenting on Table 1, which summarizes the statistics (mean and standard deviation) of the different performance metrics measured over the benchmark of models described previously. First of all, it is important to observe that, in compliance with the common knowledge in the literature, linear models perform comparatively worse than most of the rest of approaches in the study, except for GPR in which the assumption of Gaussian priors might not be appropriate for the problem at hand. Except for the latter, the remaining classical ML models perform on par of each other, achieving in all cases predictions that are of lower quality than those produced by ensembles. Indeed, RF achieves the best MAE score in the benchmark, followed by ABR and GBR. However, when turning the focus of the discussion towards RMSE and R 2 scores, RVFL-based approaches dominate the benchmark, with score values that are slightly superior than those of ELMbased models and the aforementioned ensembles. Interestingly, training times for all RVFL flavors where at least one order of magnitude lower than those of ensembles, showing off their good trade-off between complexity and performance. For the sake of a fair comparison, we note that standard deviations result to be relatively high when compared to the mean of every distribution. Although it must be assessed with care given the scarcity of samples over which it is computed (6, as many as yearly data partitions under consideration), the high values of this indicator suggest that the performance gaps found between models (specially those identified among ensembles, ELM-based and RVFL-based methods) are not statistically significant. In other words, standard hypothesis tests often used to shed light on this matter fail to declare that result differences are relevant under a given level of confidence. However, this is not the case of gaps reported in terms of training complexity, which are notably wider and do represent a significant point of improvement of RVFL-based variants. As for the comparison between ELM-and RVFL-based models, the subtle gaps noticed in terms of the three regression scores leave no room for conclusive insights on the structural differences among such modeling families. Nevertheless, we arrive at the general conclusion that these randomization based neural networks offer very good predictive performance levels when predicting LWD from meteorological data, while requiring low computational efforts for training.

Conclusions and Future Work
This paper has gravitated on the problem of predicting the leaf wetness duration of crops by minimally resorting to physically deployed sensors. For this purpose, supervised learning models for regression have been proposed as efficient means to model the correlation existing between meteorological conditions and the duration of wetness in a given crop area. This effectively builds a soft sensing tool in which an initial, temporal installation of sensors is performed to learn the models, which once trained can elicit accurate predictions of the wetness duration in plants based on meteorological information, thereby overriding any need for a continued use of sensing devices.
From the algorithmic point of view, our work has contributed to the state of the art in this topic by analyzing with empirical evidence the potential of modern randomization-based neural networks for this modeling problem. Specifically, different architectural variants of RVFL networks have been under research, which extends their original single-layered version with layer stacking and ensembles. These modifications allow for a superior modeling capacity of the RVFL model, making possible to extract a hierarchy of fine-grained hidden features that represents the complex interactions between meteorological predictors and the duration of the leaf wetness in a better way. Results obtained over real-world data captured under the umbrella of the AFarCloud European project buttress this statement, and support the overall idea that this family of randomization-based neural networks can attain accurate predictions without requiring high computational efforts for adjusting their parameters.
Manifold research directions are foreseen as a continuation of this work. Among them, we highlight the use of feature attribution methods to quantify and explain which input variables impact most on the duration of the leaf wetness [21]. Furthermore, model-agnostic techniques for output confidence estimation (e.g., conformal prediction or Bayesian neural frameworks) will be explored to evaluate to which extent the randomness of these models propagate to their output by increasing the amount of epistemic uncertainty observed in their predictions. Finally, other predictors will be considered without jeopardizing the soft sensing capability of the tool, such as the watering cycle of the farm or geographical/topographical aspects, that we expect may ease the transferability of the knowledge captured by the ML models among different plantations.