A novel computer vision based neutrosophic approach for leaf disease identiﬁcation and classiﬁcation

The natural products are inexpensive, non-toxic, and have fewer side effects. Thus, their demand especially herbs based medical products, health products, nutritional supplements, cosmetics etc. are increasing. The quality of leafs deﬁnes the degree of excellence or a state of being free from defects, deﬁcits, and substantial variations. Also, the diseases in leafs possess threats to the economic, and production status in the agricultural industry worldwide. The identiﬁcation of disease in leafs using digital image processing, decreases the dependency on the farmers for the protection of agricultural products. So, the leaf disease detection and classiﬁcation is the motivation of the proposed work. In this paper, a novel fuzzy set extended form neutrosophic logic based segmentation technique is used to evaluate the region of interest. The segmented neutrosophic image is distinguished by three membership elements: true, false and intermediate region. Based on segmented regions, new feature subset using texture, color, histogram and diseases sequence region are evaluated to identify leaf as diseased or healthy. Also, 9 different classiﬁers are used to monitor and demonstrate the discrimination power of combined feature effectiveness, where random forest dominates the other techniques. The proposed system is validated with 400 cases (200 healthy, 200 diseased). The proposed technique could be used as an effective tool for disease identiﬁcation in leafs. A new feature set is promising and 98.4% classiﬁcation accuracy is achieved. (cid:1) 2018 Elsevier Ltd. All rights reserved.


Introduction
Leafs are the major ingredients in traditional medicinal drugs.World Health Organization (WHO) has estimated that approximately 80% of the world population still relies on traditional medicines, which are mostly plant-based drugs [1].Although researchers have worked intensively to identify the diseases of plant leafs using various techniques like DNA/RNA, polymerase chain reaction, sensors techniques etc. [2] but the domain of computer vision to recognize the symptoms of diseases in medicinal plant leafs, still remains less explored.The objective of this paper is to present a computer vision-based approach for detecting basil leaf healthy or disease.Basil, an ancient and popular herbal plant is characterized with significant health benefiting phytonutrients.Basil has a profound significance in medicine and religious prospective.Swiss Federal Institute of Technology observed the existence of high quantities of (E)-beta-caryophyllene (BCP) in basil, which is believed to be helpful in the treatment of arthritis and inflammatory bowel diseases [3].Basil is indigenous to the countries of to Iran, India as well as other tropical regions of Asia [4], contains the essential oil and oleoresin required for manufacturing perfumes, food flavours, and aromatherapy products [5].It has been used for around 300 different herbal treatments to support healthy response to stress, energy booster, increase stamina, healing properties, promotes cardiovascular health, cancer, heart diseases etc.The proposed paper is divided into two parts; the preliminary phase develops new segmentation technique based on Neutrosophic logic while another phase describes new features extraction method.Based on these two phases, the classifier will categorize leaf as healthy or diseased.The data base for the proposed system contains healthy and infected basil leaf images.The work represented in this paper is divided into 5 sections.Section 2 gives a brief review of the literature.Section 3 represents the technique and measures including novel segmentation and feature extraction technique.The features accuracy is evaluated using nine different classifiers; it is defined in Section 4. Experimental results are illustrated in Section 5. Conclusion is discussed in Section 6.

Literature review
The conventional method of identification and determination of the disease in medicinal leafs is manual.However, this manual process is time-consuming, tedious and moreover very subjective [6].In recent years, numerous methods were developed using computer vision to detect and classify agricultural and horticultural crops diseases to overcome the problems of manual techniques [7,8].The basic approach for all of these methods includes image acquisition, feature extraction, feature selection and then classification analysis with parametric or non-parametric statistics.For effective operation of computer vision system, choice of the image-processing methods and classification strategies are chief concern.In literature, many efforts have been made to explain different modules of disease detection techniques for different agricultural/horticultural applications.A survey of the research work done during last few years on such leaf images is summarized in Table 1.The abbreviations used are summarized in the last row of Table 1.

Data set collection and imaging set up
In the present work, the leaf dataset consists of four types of healthy and diseased basil leaf images; these are Ocimum sanctum (Kapoor basil), Ocimum tenuiflorum (Ram & Shyama basil), Ocimum basilicum (holy basil) and Ocimum gratissimum (Vana-holy basil).These were collected from the herbs garden at Punjab Agriculture University Ludhiana, National Institute of Pharmaceutical Education and Research (NIPER) Mohali and Punjabi University Patiala, India for reflective study.A pictographic assessment of the above mentioned study site is shown in Fig. 1.
The images are taken to the research laboratory and cleaned for non-uniform distribution of dust to attain similar surface condition for all leaf categories.After cleansing of the leaf samples, leafs are then taken to an imaging station and images of leaf samples are acquired indoor to minimize the noxious effects of variants in ambient lighting conditions.To simulate outdoor environments and to avoid factors such as illumination and orientation four fluorescent bulbs with natural light filters and reflectors are used.Leafs were digitally captured in color using EOS 5D Mark III, 22.3 megapixel CMOS sensor with resolution of 5760 Â 3840 pixels, 14-bit A/D Conversion, wide Range ISO Setting 100 25600, which can shoot up to 6 frames per second (fps) from a constant height (45 cm) over the center of the imaging station.A camera positioned vertically from the samples to contain all the components, with best possible resolution.The camera was stipulated on a camera stand which reduces the movement of hand and capturing uniform images of basil leafs.The degree of the damage caused in leafs varied between the leaf samples.Images were captured under controlled field conditions to reduce the unfavourable effects of deviation in surrounding lighting conditions.To obtain uniform illumination four 16 W cool white fluorescent bulbs (4500 K color temperature) placed at 30 cm above the imaging station surface.Lamps (bulbs) with natural light reflectors located at 45 degree angle to ensure proper lighting.

System model
The system model is comprised of four essential steps as follows: 1. Preprocessing: The aim of preprocessing is to bring out details that are obscured with contrast limited adaptive histogram equalization method [42] for better contrast.These four phases have been discussed in detail in the following sub-sections.The flow chart shown in Fig. 4 represents the proposed methodology.

Pre-processing
Quality of image is improved by adjusting the intensities of the image in order to highlight the target areas i.e. diseased visual area after data collection is completed.Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm is deployed for image enhancement, it works on small sections of the image instead of whole image.As the name suggests CLAHE algorithm applies the histogram equalization after partition the image into contextual regions [42].It makes the hidden features of the image clearly visible and distribution of used gray values.Bilinear interpolation is used to combine the adjacent tiles for elimination of artificially induced boundaries.The contrast inhomogeneous' areas can be limited to avoid amplifying any noise that might be present in the image.

Segmentation technique
Image segmentation is a difficult task due to the complexity and diversity of images .Factors such as illumination [44], contrast [45], and noise [46] etc. affect segmentation results.The goal of segmentation is to locate the suspicious areas to diagnose the disease.We have proposed new Neutrosophic logic approach as a segmentation technique.A neutrosophic set is an extended form of the fuzzy set, tautological set, dialetheist set, paradoxist set, intuitionistic set and paraconsistent set [47].An image is represented using three different membership elements asðTÞ, (I) and (F).Where T defines the truth scale, F as thescale of false and I represent the scale of intermediate.All elements considered are independent of each other.A pixel in the Neutrosophic logic domain is characterized asP t; i; f f g, in the way as it is t% true, i%indeterminate and f % as false [48]. 1.To acquire unhealthy segment: Let the input image isI i ðx; yÞ.
After contrast enhancement, it is represented as I c ðx; yÞ, then diseased segment T IS x; y ð Þ is formularized as 2. The healthy segment of leaf is evaluated as where, F IS x; y ð Þ represents a healthy segment of leafs.Healthy section represents the green color or section of leaf image.
3. Intermediate segment is considered as the stage which is not exactly diseased or healthy as well, we can consider it as onset disease.To evaluate intermediate portion, initially original image, I i ðx; y) is transformed into CMYK color space I cmyk ðx; y) for extracting yellow color [50] denoted as I y ðx; y) in the leaf which is observed due to chemical changes, rust disease, and chlorophyll breakdown etc.
Further green color is extracted from original imagesI i ðx; y) as I green ðx; y).Where,M g x; y ð ÞandM y x; y ð Þare the masks that represent remaining portion of the leaf where yellow and green segmen are not considered.So, T IS ðx; yÞ.represents the degree of being a diseased segment, F IS x; y ð Þ is the degree of being a healthy segment and I IS x; y ð Þ is a degree of being not healthy not diseased as well.Fig. 6 represents the pictorial representation of extracted true, false and intermediate region.

Feature extraction
Feature extraction is to reduce the image data by measuring certain features or properties of each segmented regions [51].Features are used to define the distinct characteristics of an image [52].After image segmentation, the next important task is to extract the useful features of the image in order to diagnose the disease.We use new feature pool illustrated in the following subsections with details.Feature table exhibits histogram information content, damage structure index, disease sequence region and bin binary pattern features.The catalogue of features are illustrate in Table 2.where: x, y = Pixel location T i = Pixel count of diseased area I i = Pixel count of on set off diseases area F i = Healthy region pixel count of leaf G K = Centre value, G C = Neighbourhoods pixel K = Number of pixel in the neighbourhood.

Histogram information content (HIC)
Histogram is easy to compute and effective in characterizing both global and local distributions of colors in an image.Histogram information content defines the relative information content by finding the probability of occurrence of relative information about each plane in the image.Information will vary for every leaf for each plane.HIC is defined as HIC ¼ log 1 histogram information contents of segmented regions ð4Þ HIC is evaluated for all three segments T IS ; F IS , I IS for each red, green and blue plane.

Disease sequence region (DSR)
Disease Sequence Region (DSR) defines the correlation of individual neighbouring pixels with perceived pixel difference of the image, that is, pixel deviations between neighbouring pixels, refer Eqs. ( 5) and ( 6).We have calculated DSR for every extracted region (red, green and blue) of image vertically and horizontally.The DSR defined for vertical and horizontal orientation are given in Eqs. ( 5) and ( 6) as Vertical deviation of intensity Horizontal deviation of intensity where, x and y defines pixel location.Depending on horizontal and vertical deviations, we measure the deviation difference between healthy and non-healthy leaf.

Damage index (DI)
The damage index defined as the amount of spaces taken by diseased segment of leaf given as by where, T i represents pixel count of diseased area, I i represents pixel count of on set off diseases area and F i represents healthy region pixel count of leaf and higher DI value indicates more diseased region.DI represents possible presence of damage (diseases) at leaf structure.

Bin binary pattern (BBP)
A new texture descriptor as BBP is introduced to describe the local structure information of leaf.The BBP linearly interpolates the pixel value of neighborhood to form an operator which defines the structure to distinguish all individual patterns (healthy or nonhealthy leafs).To make it computationally simple, three separate planes (red, green, blue) are considered and the histogram is created for the same.The histograms are then split into 9 bins and mapped to 3 Â 3 matrix to evaluate its mean intensity value and calculate the difference between the center pixel and neighboring pixels as defined in Eq. (9).The weights for obtained different binary pattern are given in a clockwise direction starting from top-left and its corresponding values.Where, G K represents centre value, G C defines neighbourhoods pixel, x C and y C represents pixel value and K, defines number of pixel in the neighbourhood.
Algorithm of BBP is described as follows: Input: Input leaf image Output: Set of unique decimal values represents the local structure information Step 1: Calculate true, false and intermediate using neutrosophic segmentation Step 2: Divide image into 9 different bins using histogram w.r.t to red, green and blue plane Step 3: Calculate mean of all bins.
Step 4: Evaluate the difference of all neighbourhood bins w.r.t to centre value using Eq. ( 9).

Step 5: Assign weights Step 6: Obtain unique decimal values
The whole procedure of BBP is defined briefly in Fig. 7 briefly step by step.

Classification
In this paper, we evaluate nine classifiers accuracies and effectiveness and select the best one.The brief summary of the classification models are listed below: Decision tree: Supervised learning algorithm estimates the significance of a target variable using numerous input variables [53].Random forest: Ensemble learning method, works using bagging method to construct a group of decision tree using random subgroup of the data [54].Support vector machine: Discriminative approach described by separating hyper plane that increases the boundary between the two classes [55].AdaBoost: Boosting approach where, multiple weak classifiers are engaged to make a single strong classifier [56].Linear models: Linear models analysis of covariance and single stratum analysis of variance [57].Naives Bayes: Supervised learning algorithm based on Bayes theorem with the naive assumption of independence between every pair of features [58] K-NN: Instance based learning, where data is classified based on stored and labeled instances according to some distance/similarity function [59].Artificial neural networks: Mathematical model simulate data based on structure and functions of biological Neural networks [60,61].

Extract yellow color mask with calculating difference from RGB channels
Discriminant analysis: Builds predicative model with analysis of regression and variance to define relationship between one dependent variable and one or more independent variable [62].
The tuning parameters of machine learning methods are tabulated in Table 3.It also indicates models, methods packages and platform used for calculating and finding parameters.On the basis of parameters, classifiers will categorize image as healthy or disease leafs.

Leaf images dataset
The database consists of 400 images which include 200 healthy and 200 diseased leafs of different categories ofleafs i.

Classification model evaluation metrics
Different evaluation parameters were used to measure the performance of the classification process [63], defined as PrecisionðPositive predicted valueÞ Negative predicted valve ðNPVÞ ¼ where In this section, we analyze the prediction results of nine machine learning methods on the basis of training-testing dataset described in Table 4.The distribution of data in the training-testing experiment is set to 70% and 30% respectively for all models.
Fig. 7. Bin binary pattern.possible threshold values.It provides the capability to access the performance of classifier.Where, AUC process the whole two dimensional area under the entire ROC curve.The AUC portray the probability that an indiscriminately selected positive example is accurately rated with greater suspicion than a randomly chosen negative example [65].AUC ranges in value from 0 to 1. High value of AUC typically reflects good discrimination competence of a classifier.An area of 1 represents a perfect test; an area of 0.5 represents a worthless test.The model performs better if an ROC curve is lifted up and away from the diagonal.Fig. 11 shows ROC curve of Random forest.

Performance of features
Furthermore, the accuracy is evaluated on 50-50, 60-40, 70-30 and 80-20 testing-training partition respectively to ensure its uniformity as illustrated in Table 5. Results show that Random forest performs well in all testing-training partition.
In the next experiment, we compare the classification accuracy of our proposed features with respect to the traditional feature extraction methods [11,66,67].As shown in Table 6 proposed method gives better performance than other classifiers.

Conclusion
The main contribution of this paper is to successfully design new segmentation technique together with a new set of features.The whole procedure was described, respectively, from gathering images to segmentation and finally classification.Based on the segmentation new features have been extracted.These features combine the discrimination power of intensity and texture of leafs.Nine classifiers are used to measure the accuracy of proposed features.These proposed features give promising results and has been compared with existing feature extraction methods.The developed model was able to distinguish healthy and diseases leaf.Based on the graphical analysis, RF performs better than other machine learning models with 98.4% accuracy.
Future studies could focus on to extend proposed work to classify each diseases category individually and estimate the severity of the detected diseases.An undiscovered amalgamation of feature extraction, feature selection and learning methods can also be explored to enhance the efficacy of diseases detection and classification models.
Fig. 2 represents the experimental set up of proposed system.The database consists of 400 images which include 200 healthy and 200 diseased leafs of different categories of leafs i.e.Ocimum sanctum (Kapoor basil), Ocimum tenuiflorum (Ram & Shyama basil), Ocimum basilicum (holy basil) and Ocimum gratissimum (Vana-holy basil).A hundred samples each for the four classes of leafs are collected.The diseases of leaf samples investigated are downy mildew, aphids, gray mold, bacterial leaf spot and fusarium wilt.Fig. 3 represents the healthy and diseased basil leafs.

2 .
Segmentation: After preprocessing transform the image into neutrosophic domain, which segments the images into three different regions: True, False and Intermediate segments.3. Feature extraction: Design a new feature pool based on segmented three regions to distinguish healthy and diseased leafs.4. Classification: Nine different classifiers are used for final classification decision.

Fig. 1 .
Fig. 1.Study sites of basil plants, from where different basil leafs were collected.

3. 2 . 2 . 1 .
Mapping T, F & I (Region of interest evaluation).In the proposed method, the diseased area of leaf employed as the true part (T), healthy element represented as false part (F) and intermediate element (I) is defined as neither healthy (F) nor diseased (T).The neutrosophic domain provides extra element as 'I' which provides a more efficient way to handle the degree of uncertainty.To evaluate diseased segment, the original image pixels are transformed from RGB to CIELab color space for better color perception as compared to the standard RGB space [49].CIELab color space consists of 3 channels, as (L), (a)* and (b)*, where (L) channel represents lightness with values 0 (black) to 100 (white), positive values of (a*) channel indicate amount of red while negative values indicate amount of green color opponent and (b*) channel, positive values indicate yellow and negative value indicates amount of blue.After enhancement and color transformation, T, I, F are mapped as follows:

Fig. 2 .
Fig. 2. a) Pictorial view of experimental set up, b) & c) Image acquisition system with different leaf structures.

Fig. 4 .
Fig. 4. Flow chart of system model for proposed method.

False = 1 Fig. 5 .
Fig. 5. Flow chart of system model for proposed segmentation method.

Fig. 6 .
Fig. 6.Regions detection results: a) Represents original captured image, b) Pre-processing using CLAHE algorithm, c) True region which represents the diseases region, d) False region, where healthy region of leaf is presented, e) Intermediate region of the leaf.

Figs. 8 -
Figs. 8-10 illustrates the classification accuracy of all classifiers with respect to various evaluation parameters as defined in Section 5.2.Compared to other machine learning models, random forest maintains a high accuracy.Another performance measures are Receiver Operating Characteristics (ROC) and Area under the Curve (AUC).ROC is an efficient method for evaluating discrimination power of statistical model[64].It plots the sensitivity versus specificity across the different

Fig. 9 .Fig. 10 .
Fig. 9. Comparison of various classifiers in terms of accuracy and error rate.

Table 1
Survey of leaf disease detection and classifications system.

Table 2
Catalogue of features.

Table 3
Tuning parameters of classifiers.

Table 4
Testing and training data distribution of healthy and diseased leafs.

Table 5
Performance comparison on different testing -training partition.

Table 6
Performance comparison of various classifiers.