Class Outliers Mining: Distance-Based Approach

Nabil M. Hewahi; Motaz K. Saad

doi:10.5281/zenodo.1078088

Published September 24, 2007 | Version 11868

Journal article Open

Class Outliers Mining: Distance-Based Approach

In large datasets, identifying exceptional or rare cases with respect to a group of similar cases is considered very significant problem. The traditional problem (Outlier Mining) is to find exception or rare cases in a dataset irrespective of the class label of these cases, they are considered rare events with respect to the whole dataset. In this research, we pose the problem that is Class Outliers Mining and a method to find out those outliers. The general definition of this problem is "given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels". We introduce a novel definition of Outlier that is Class Outlier, and propose the Class Outlier Factor (COF) which measures the degree of being a Class Outlier for a data object. Our work includes a proposal of a new algorithm towards mining of the Class Outliers, presenting experimental results applied on various domains of real world datasets and finally a comparison study with other related methods is performed.

Files

11868.pdf

Files (311.6 kB)

Name	Size	Download all
11868.pdf md5:3c6f11eb9b6e8ca7c2bc72a74c6fafa8	311.6 kB	Preview Download

Additional details

Angiulli, F., Pizzuti, C.: Fast Outlier detection in high dimensional spaces, In Proc. of the Sixth European Conference on the Principles of Data Mining and Knowledge Discovery, pp. 15-26, 2002.
Barbar├á, D., Chen, P.: Using the fractal dimension to cluster datasets, In: Proc. KDD, pp. 260-264, 2000.
Barnett, V., Lewis, T.: Outliers in Statistical Data, John Wiley, 1994.
Bay, S. D., and Schwabacher, M.: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule, Proc. of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
Blake C., Keogh E., Merz C. J.: UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearn/MLRepository.htm, 1998.
Bolton, R. J., Hand, D. J.: Statistical fraud detection: A review (with discussion), Statistical Science, 17(3): pp. 235-255, 2002.
Breunig, M., Kriegel, H., Ng, R., Sander, J.: LOF: Identifying densitybased local outliers, In: Proc. SIGMOD Conf, pp. 93-104, 2000.
Eskin E., Arnold A., Prerau M., Portnoy L., Stolfo S.: A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data, In Data Mining for Security Applications, 2002.
Ester M., Kriegel H.-P., Sander J., Xu X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), Portland, OR. pp. 226-231, 1996. [10] Han, J., Kamber, M.: Data Mining: Concepts and Techniques, San Francisco, Morgan Kaufmann, 2001. [11] Hawkins, D.: Identification of Outliers, Chapman and Hall, 1980. [12] Hawkins, S., He, H. X., Williams, G. J., Baxter, R. A.: Outlier detection using replicator neural networks, In Proc. of the Fifth Int. Conf. and Data Warehousing and Knowledge Discovery (DaWaK02), 2002. [13] He, Z., Deng, S., Xu., X.: Outlier detection integrating semantic knowledge, In: Proc. of WAIM-02, pp. 126-131, 2002. [14] He, Z., Xu, X., Huang, J., Deng, S.: Mining Class Outliers: Concepts, Algorithms and Applications in CRM, Expert Systems with Applications (ESWA'04), 27(4): pp. 681-697, 2004. [15] Jain, A., Murty, M., Flynn, P.: Data clustering: A review, ACM Comp, Surveys 31, 264-323, 1999. [16] Johnson, T., Kwok, I., Ng, R.: Fast computation of 2-dimensional depth contours, In: Proc. KDD. pp. 224-228, 1998. [17] Knorr E. M., Ng. R. T.: Finding intensional knowledge of distancebased outliers, In Proc. of the 25th VLDB Conference, 1999. [18] Knorr, E., Ng, R., Tucakov, V.: Distance-based outliers: Algorithms and applications, VLDB Journal 8, pp. 237-253, 2000. [19] Knorr, E., Ng, R.: A unified notion of outliers: Properties and computation, In: Proc. KDD. pp. 219-222, 1997. [20] Knorr, E., Ng, R.: Finding intentional knowledge of distance-based outliers, In: Proc. VLDB. pp. 211-222, 1999. [21] Knorr, E.M., Ng, R.: Algorithms for mining distance-based outliers in large datasets, In: Proc. VLDB pp. 392-403, 1998. [22] Lane, T., Brodley, C. E.: Temporal sequence learning and data reduction for anomaly detection, ACM Transactions on Information and System Security, 2(3): pp. 295-331, 1999. [23] Michalski, R. S., Winston, P. H.: Variable Precision Logic, Artificial Intelligence Journal 29, Elsevier Science Publishers B.V. (North- Holland), pp. 121-146,1986. [24] Papadimitriou, S., Faloutsos C.: Cross-outlier detection, In: Proc. of SSTD-03, pp. 199-213, 2003. [25] Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets, In Proc. of the ACM SIGMOD Conference, pp. 427-438, 2000. [26] Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection, John Wiley and Sons, 1987. [27] Rulequest Research, Gritbot, http://www.rulequest.com [28] Witten, I. H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, (Second Edition), San Francisco, Morgan Kaufmann, 2005.

	All versions	This version
Views	235	235
Downloads	101	101
Data volume	32.1 MB	32.1 MB

Class Outliers Mining: Distance-Based Approach

Authors/Creators

Description

Files

11868.pdf

Files (311.6 kB)

Additional details

References