Accepted Paper: Investigating the effect of novel classes in semi-supervised learning

Back to list of accepted papers

Authors

Yuxuan(Alex) Peng (University of Auckland); Yun Sing Koh (The University of Auckland, New Zealand); Patricia Riddle (University of Auckland, New Zealand); Bernhard Pfahringer (University of Waikato)

Abstract

Semi-supervised learning usually assumes the distribution of the unlabelled data to be the same as that of the labelled data. This assumption does not always hold in practice. We empirically show that unlabelled data containing novel examples and classes from outside the distribution of the labelled data can lead to a performance degradation for semi-supervised learning algorithms. We propose a 1-nearest-neighbour based method to assign a weight to each unlabelled example in order to reduce the negative effect of novel classes in unlabelled data. Experimental results on MNIST, Fashion-MNIST and CIFAR-10 datasets suggest that the negative effect of novel classes becomes statistically insignificant when the proposed method is applied. Using our proposed technique, models trained on unlabelled data with novel classes can achieve similar performance as ones trained on clean unlabelled data.