Published September 24, 2007
| Version 11868
Journal article
Open
Class Outliers Mining: Distance-Based Approach
Authors/Creators
Description
In large datasets, identifying exceptional or rare cases
with respect to a group of similar cases is considered very significant
problem. The traditional problem (Outlier Mining) is to find
exception or rare cases in a dataset irrespective of the class label of
these cases, they are considered rare events with respect to the whole
dataset. In this research, we pose the problem that is Class Outliers
Mining and a method to find out those outliers. The general
definition of this problem is "given a set of observations with class
labels, find those that arouse suspicions, taking into account the
class labels". We introduce a novel definition of Outlier that is Class
Outlier, and propose the Class Outlier Factor (COF) which measures
the degree of being a Class Outlier for a data object. Our work
includes a proposal of a new algorithm towards mining of the Class
Outliers, presenting experimental results applied on various domains
of real world datasets and finally a comparison study with other
related methods is performed.
Files
11868.pdf
Files
(311.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:3c6f11eb9b6e8ca7c2bc72a74c6fafa8
|
311.6 kB | Preview Download |
Additional details
References
- Angiulli, F., Pizzuti, C.: Fast Outlier detection in high dimensional spaces, In Proc. of the Sixth European Conference on the Principles of Data Mining and Knowledge Discovery, pp. 15-26, 2002.
- Barbarà, D., Chen, P.: Using the fractal dimension to cluster datasets, In: Proc. KDD, pp. 260-264, 2000.
- Barnett, V., Lewis, T.: Outliers in Statistical Data, John Wiley, 1994.
- Bay, S. D., and Schwabacher, M.: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule, Proc. of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
- Blake C., Keogh E., Merz C. J.: UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearn/MLRepository.htm, 1998.
- Bolton, R. J., Hand, D. J.: Statistical fraud detection: A review (with discussion), Statistical Science, 17(3): pp. 235-255, 2002.
- Breunig, M., Kriegel, H., Ng, R., Sander, J.: LOF: Identifying densitybased local outliers, In: Proc. SIGMOD Conf, pp. 93-104, 2000.
- Eskin E., Arnold A., Prerau M., Portnoy L., Stolfo S.: A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data, In Data Mining for Security Applications, 2002.
- Ester M., Kriegel H.-P., Sander J., Xu X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), Portland, OR. pp. 226-231, 1996. [10] Han, J., Kamber, M.: Data Mining: Concepts and Techniques, San Francisco, Morgan Kaufmann, 2001. [11] Hawkins, D.: Identification of Outliers, Chapman and Hall, 1980. [12] Hawkins, S., He, H. X., Williams, G. J., Baxter, R. A.: Outlier detection using replicator neural networks, In Proc. of the Fifth Int. Conf. and Data Warehousing and Knowledge Discovery (DaWaK02), 2002. [13] He, Z., Deng, S., Xu., X.: Outlier detection integrating semantic knowledge, In: Proc. of WAIM-02, pp. 126-131, 2002. [14] He, Z., Xu, X., Huang, J., Deng, S.: Mining Class Outliers: Concepts, Algorithms and Applications in CRM, Expert Systems with Applications (ESWA'04), 27(4): pp. 681-697, 2004. [15] Jain, A., Murty, M., Flynn, P.: Data clustering: A review, ACM Comp, Surveys 31, 264-323, 1999. [16] Johnson, T., Kwok, I., Ng, R.: Fast computation of 2-dimensional depth contours, In: Proc. KDD. pp. 224-228, 1998. [17] Knorr E. M., Ng. R. T.: Finding intensional knowledge of distancebased outliers, In Proc. of the 25th VLDB Conference, 1999. [18] Knorr, E., Ng, R., Tucakov, V.: Distance-based outliers: Algorithms and applications, VLDB Journal 8, pp. 237-253, 2000. [19] Knorr, E., Ng, R.: A unified notion of outliers: Properties and computation, In: Proc. KDD. pp. 219-222, 1997. [20] Knorr, E., Ng, R.: Finding intentional knowledge of distance-based outliers, In: Proc. VLDB. pp. 211-222, 1999. [21] Knorr, E.M., Ng, R.: Algorithms for mining distance-based outliers in large datasets, In: Proc. VLDB pp. 392-403, 1998. [22] Lane, T., Brodley, C. E.: Temporal sequence learning and data reduction for anomaly detection, ACM Transactions on Information and System Security, 2(3): pp. 295-331, 1999. [23] Michalski, R. S., Winston, P. H.: Variable Precision Logic, Artificial Intelligence Journal 29, Elsevier Science Publishers B.V. (North- Holland), pp. 121-146,1986. [24] Papadimitriou, S., Faloutsos C.: Cross-outlier detection, In: Proc. of SSTD-03, pp. 199-213, 2003. [25] Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets, In Proc. of the ACM SIGMOD Conference, pp. 427-438, 2000. [26] Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection, John Wiley and Sons, 1987. [27] Rulequest Research, Gritbot, http://www.rulequest.com [28] Witten, I. H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, (Second Edition), San Francisco, Morgan Kaufmann, 2005.