Concept Drift Identification using Classifier Ensemble Approach

ABSTRACT


INTRODUCTION
Research over last few decades have developed many data mining algorithms for discovering knowledge underlying the data [1], [2]. These algorithms, however, are often used for static datasets, while recently developed new applications face the problem of processing large volumes of data generated in the form of data streams. Applications like sensor networks, web log analysis, and telecommunication systems require to process data generated at very high rates. Data stream imposes challenges like limited memory, less processing time and one scan of instances while processing. Traditional data mining algorithms cannot efficiently handle these problems, thereby leading to the development of stream data mining techniques. One of the challenges while learning from data streams is handling concept drifts, i.e., changes in data streams which deteriorate the accuracy of classifiers. This happens since classifiers learnt on past data instances are used for labelling recent data instances that reflect current concept which may be different from the old ones. Thus, for handling drifts in data streams, classifiers must use some technique for adjusting with changing environment [3][4][5][6]. Also, classifiers must be able to detect different types of drifts, sudden and gradual drift, that are characterized by the rate of changes observed [7]. Data stream is the sequence of data instances {x t ,y t } for time t=1,2,3,...,T, where x is set of attributes and y is class label. We assume that as data instance x t arrives, classifier C predicts its class label. After some time, actual class label y t is available and is used by classifier for evaluation and as additional information for training purpose. This technique called supervised learning is used by most of data mining algorithms. However constraints applied by stream data are not well addressed by this technique.As time elopes, the concept about which data is collected changes over time. This phenomenon, also called, concept drift is divided into two main categories as: sudden drift and gradual drift. The first type of drift occurs when source distribution S of data stream is suddenly replaced by another distribution S ' . The later type of drift is associated with slower rate of changes in data streams. Typically, data instances from different source distributions start mixing, where probability of observing data instances from new source distribution increases and that of old distribution decreases over time Multiple algorithms have been proposed for dealing with concept drifts in data streams. Here we describe works related to our study briefly.
Drift detector is mechanism used for analyzing data instances and triggering alarm as soon as drift is observed. The trigger indicates need of rebuilding classifier. The most popular drift detection is Drift Detection Method in which predicted labels are compared with actual labels for determining classification errors. Classification error is monitored to check if it falls beyond certain threshold. When an error falls beyond threshold, alarm is signalled to store incoming data instances into a buffer. When alarm level is reached, new classifier is build on data instances in buffer and old classifier is removed.Concept drift is adapted into system when system is updated over current concept. The popular technique for accommodating current concept is windowing technique. This technique helps in keeping selected data instances in the system. Windowing technique is most widely used, since it keeps most recent data instances while eliminating data instances belonging to old concepts. Window size is common trade-off due to the fact that larger window size helps in keeping track of slower changes, but fail in case of sudden changes, whereas smaller window size can adapt sudden drifts efficiently as compared to gradual changes.The best way for dealing with concept drifts in data streams is ensemble technique which is a set of component classifier, votes of which are combined to predict class labels. Ensemble classifiers are best for dealing with changes in data streams due to their modular nature, which allows ensemble to be structured either by retraining component classifiers or by replacing weakest classifier by recently trained classifier or by updating weights assigned to component classifiers depending on their respective performances.

RESEARCH METHOD
Many algorithms have been developed with variations in basic processing technique of ensemble [8], [9] Minkuet. al. [10] proposed new approach that keeps different ensembles for dealing with diversity of concepts. It maintains different ensembles of data steams before concept drift and after concept drift in order to keep both old and new concepts in the system Senaratneet.al. [12] proposed a framework for determining hotspot of twitter activities and detecting drifts using kernel density estimation in streams of tweets. But it fails in determining the type of drift detected and taking measures over it. We overcome this problem by integrating online classifier and block-based classifier within single ensemble for reacting to different types of drifts efficiently.. To maintain minimum number of ensembles, we propose to use sliding window technique which helps in keeping most recent data instances. These data instances are used for retraining component classifiers, so as to keep ensemble updated over recent concept. In [13], [14] authors proposed a system that maintains ensemble of per-feature classifiers. One classifier is maintained for a single feature of a particular class. Such all per-feature classifiers of a class are combined for all classes making it hierarchy of weighted classifiers. The system spans over large memory space as the number of classes increases and thereby increasing time overhead. Our system analyzes features of class for checking out features responsible for drift, if any and for updating ensemble with necessary measures.
To overcome the problem of availability of actual class labels, learning technique is categorised into two types: online learning and block-based learning. In first approach, classifier predicts and evaluates as soon as datainstance is available. Whereas in later approach, blocks of data instances are used for evaluating classifier performance.Littlestone et. al. [15] put forward one of the algorithms for online learning, Weighted Majority Algorithm which aggregates predictions of component classifiers and updates weights of classifiers when predictions go wrong. Another ensemble proposed by Kotleret. al. [16] maintains set of classifiers, the weights of which are updates incrementally after each data instance. On each misclassification of data instance, the weights of the classifiers making false predictions are decremented.
Memory constraintis one among many challenges while handling data steams with concept drift. Hayat et. al. [17] proposed compact clustering technique to overcome this problem. Traditionally, every data instance of belonging to cluster was used for evaluating the cluster. The proposed compact clustering algorithm uses only neighbourhood instances for classification and clusters formed from unclassified instances are compared with clusters of classified instances for checking abnormality and detecting drifts.Gao et. al. [18] proposed framework for detecting drift as well as new emerging classes in data streams using time constraints. The proposed system maintains buffer of instances unclassified by ensemble for certain time period. Instances remaining unclassified after time limit expiry are considered to be forming drift and are further analyzed for novel class evolution. Evolution of novel classes is another challenge in learning from data streams in which new classes emerge over time generating the need of restructuring ensemble by building new classifier for new class and eliminating the weakest component classifier. Novel class and feature evolution have been studied and many algorithms have been proposed for dealing with these issues [19][20][21].
The proposed system for handling concept drifts in data streams uses ensemble classifier technique for building base classifiers and using them for classification of testing data. The system builds online classifier as soon as new data instance is available and when block of fixed number of data instances is formed, block based c classifier is developed. The classification of in coming data instance is done using weighted majority of base classifiers using weighting functions as: Where, is weight of base classifier at time and ) is the accuracy of classifier at time . The accuracy and error rates are monitored for each type of classifier continuously over blocks of data instances using Error Rate function as: Where, is the error rate of classifier on recent block of data instances and is the probability given by the classifier that is an instance of class .
As the value of error rate monitoring crosses certain threshold, drift is detected. These drifts are analyzed and ensemble is updated accordingly.

RESULTS AND ANALYSIS
In our experiments, we evaluate our proposed ensemble that combines online classifier and blockbased classifier. We implemented our ensemble system in Java. The experiments were performed on computer system with Intel Core i5 480M @2.67 GHz processor and 4.00 GB of RAM.
We tested the performance of ensemble with single component classifiers. Our ensemble used k=5 component classifiers; NB Tree, J48, Logistic, Random Forest and Bagging. The size of block used for all component classifiers and ensemble was equal d=100 as this size was best suitable for more accurate results. We evaluated ensemble performance for different sizes of block: 50, 100, 200, 500 and 1000. We observed that the statistical comparisons of performances of ensemble for each of above block size gives better results in terms of accuracy when block size was 100. However Block size does not alter ensemble accuracy significantly, but block size matters in case of drift detection. If the block size is large it ignores drifts that lasted for small time, while smaller block size detects drifts even if there are blips or noise in data streams. Real world data contain no precise information about occurrence or type of drifts in it. So it is practically impossible to test the desire accuracy in terms of drift however a manual drift is to be inserted in the data to achieve the target.So we decided to use publically available machine learning benchmark datasets gathered in the UCI repository [22] that signified presence of gradual drifts.
We evaluate our ensemble of online classifier and block-based classifier against single classifiers as well/ We chose J48, NB Tree, Logistic, Random Forest and Bagging as component classifiers of basic ensemble. The ensemble is further modified for learning incrementally as well as in blocks of fixed size. From performance comparison of proposed ensemble with component classifiers as shown in Table 1, we can see that ensemble improves the accuracy of classification on all datasets and ensemble takes equal processing time. Traditional approach of classifying testing dataset against given training dataset becomes inefficient while dealing with data streams, as the data streams keep arriving unboundedly and testing dataset is not completely available. The modification to ensemble by adding online and block-based learning component to classifiers shows significant role for keeping track of classification accuracies. Thus the two approaches: online classification and block-based classification are applied to solve this issue. These approaches help in monitoring classifier performances as data instances arrive and keeps track of changes in data stram.
In block-based technique, size of the block can be instance based or time based. In instance based technique, blocks of fixed number of instances are used. While in time based technique, blocks of data instances arriving over specific time period are used. As data streams arrive at any rate, it is not feasible to keep track of performance in time based technique, since some blocks will be densely populated while others being rarely populated. We observed this pattern in electricity data where demand and supply changes drastically with different time stamps We evaluated our ensemble using different sizes of block and compared our results with experiments conducted using MOA framework. The results are shown in Table 2.
From the experiments we concluded, although performance of ensemble remains constant for any block size, the block size of 100 instances gives results for drift detection similar to those obtained using standard MOA (Massive Online Analysis) framework. Thus for further experiments we used 100 instances as block size.  Table 3 shows drifts detection using online and block-based method to ensemble. We analysed that for any dataset our ensemble works well in comparison with the component classifiers, online and blockbased classification while detecting gradual and sudden drifts efficiently. Further we analysed the detected drifts to extract the hidden patterns in drifts that could potentially help us to find out the technique that can respond when next drift encounters. The analysis of drifts shows that values of few attributes are significantly missing and this leads to one of the root cause behind drift. The analysis of block before and after drift concluded that few blocks before drift, missing values of attributes started to observe in ascending way towards drift and in descending way afterwards the drift. The peak frequencies of missing values were noticed during drift. The results obtained as shown in Table 4 summarize that as the frequency of missing values started to increase, the performance of ensemble started degrading and visa versa. The values of attributes like workclass, occupation and native country were found missing during drifts.
The same pattern as shown in Table 4 was observed for all the further drifts in Census income dataset. This is recurring type of drift. Also we computed the information gain over the attributes of Census income dataset and we observed that the attributes that were found missing during drift are associated with highest values of information gain, thereby leading to be the main contributor to classification task. Thus handling missing values of attributes is also one of the key task in our proposed system.
We performed many experiments for handling missing values of attributes by either removing the instance that carries missing values or by replacing them using mean-mode imputation. We replaced the  Another problem that causes performance loss of classifier is the training dataset. Concept drift is the scenario where old concept occurs with less probability and new concept is mostly seen. In this situation, classifiers show degraded performances. This is because classifiers are trained on dataset that carries old concept and data instances that are being classified belong to new concept. Thusincorporating a new concept in learning process is an important task. This can be done in two ways: by using just classified recent block of data instances for retraining classifier ensemble or by adding recent block of data instances to training dataset iteratively and using it for retraining ensemble. We conducted different experiments in which for each case, we changed training dataset in three different approaches. In the first approach, we did not retrain ensemble once it is trained at starting with provided standard training dataset. In second approach, we first trained ensemble using standard training dataset and tested the first block of data instances. We used this recently tested block for retraining the ensemble and testing next block of data instances. The process is repeated iteratively for next blocks of data streams. This case of keeping recently classified block as training dataset maintains the current concept, but fails to keep track of old concepts as time proceeds. Thus in third approach, while proceeding forward we simply added recently classified block to the training dataset and then retrained the ensemble. Figure 2 shows the performance of ensemble using the above three approaches of retraining ensemble. The results concludes that adding recent block of data instances to provided standard training dataset is a wise solution and improves the performance of ensemble very well.

CONCLUSION
Most of recent applications generate large volumes of data at rapid rate, thus establishing the need of special data mining techniques for critical sensitive applications. Data stream mining is solution to the problem. Data stream mining poses challenges like limited memory storage, performance and change in concepts underlying data. Identification of two types of drifts, sudden and gradual drift may degrade the classifiers performance in terms of accuracy. To monitor such causes and parameters of drift identification, our proposed ensemble combines the characteristics of online classification technique and block-based classification technique and detects both types of drifts efficiently. Further our system analyzes theattributes which are responsible behind the change in accuracy.Noisy, unbalance data and missing values of attributes were found to be the root cause behind the drift [23]. Our proposed system shows improved performance while detecting both kinds of drifts efficiently.However our work focuses on offline streaming data.
The contribution opens several directions for research studies. Current works in data stream classification focus detection of concept drift. Adaption of drifts to the system leads to new line of research. An interesting future work would be to identify evolution of new concepts for online streaming data. Recent techniques can be further extended for solving novelty detection problem.