Published June 12, 2019 | Version v1
Journal article Open

Ensemble with Estimation: Seeking for Optimization in Class Noisy Data

  • 1. Harbin Institute of Technology S
  • 2. Department of Computer Science, University of Warwick
  • 3. Department of Computing, the Hong Kong Polytechnic University, Hong Kong.
  • 4. University of International Relations University of International Rela
  • 5. Big Data Institute, ShenZhen University

Description

Class noise, as know as the mislabeled data in training set, can lead to poor accuracy in classification no matter what machine learning methods are used. A reasonable estimation of class noise has a significant impact on the performance of learning methods. However, the error in existing estimation is inevitable theoretically and infer the performance of optimal classifier trained on noisy data. In stead of seeking a single optimal classifier on noisy data, in this work, we use a set of weak classifiers, which are caused by negative impacts of noisy data, to learn an ensemble strong classififier which is based on the training error and estimation of class noise. By this strategy, the proposed ensemble with estimation method overcomes the gap between the estimation and true distribution of class noise. Our proposed method does not require any a priori knowledge about class noises. We prove that the optimal ensemble classifier on the noisy distribution can approximate the optimal classifier on the clean distribution when the training set grows. Comparisons with existing algorithms show that our methods outperform state-of-the-art approaches on a large number of benchmark datasets in different domains. Both the theoretical analysis and the experimental result reveal that our method can improve the performance, works well on clean data and is robust on the algorithm parameter.

Files

Ensemble_with_Estimation__Seeking_for_Optimization_in_Class_Noisy_Data.pdf

Additional details

Funding

DeepPatient – Deep Understanding of Patient Experience of Healthcare from Social Media 794196
European Commission