Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance

Ekachai Phaisangittisagul; Rapeepol Chongprachawat

doi:10.5281/zenodo.1333895

Published April 27, 2013 | Version 7486

Journal article Open

Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance

Obtaining labeled data in supervised learning is often difficult and expensive, and thus the trained learning algorithm tends to be overfitting due to small number of training data. As a result, some researchers have focused on using unlabeled data which may not necessary to follow the same generative distribution as the labeled data to construct a high-level feature for improving performance on supervised learning tasks. In this paper, we investigate the impact of the relationship between unlabeled and labeled data for classification performance. Specifically, we will apply difference unlabeled data which have different degrees of relation to the labeled data for handwritten digit classification task based on MNIST dataset. Our experimental results show that the higher the degree of relation between unlabeled and labeled data, the better the classification performance. Although the unlabeled data that is completely from different generative distribution to the labeled data provides the lowest classification performance, we still achieve high classification performance. This leads to expanding the applicability of the supervised learning algorithms using unsupervised learning.

Files

7486.pdf

Files (548.7 kB)

Name	Size	Download all
7486.pdf md5:f41e53bd50bd8983a577ff96147f54fb	548.7 kB	Preview Download

	All versions	This version
Views	41	41
Downloads	40	40
Data volume	23.6 MB	23.6 MB

Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance

Creators

Description

Files

7486.pdf

Files (548.7 kB)