Improving Deep Label Noise Learning with Dual Active Label Correction
Shao-Yuan Li (Nanjing University of Aeronautics and Astronautics)*; Ye Shi (Nanjing University of Aeronautics and Astronautics); Sheng-Jun Huang (Nanjing University of Aeronautics and Astronautics); Songcan Chen (Nanjing University of Aeronautics and Astronautics)
Abstract
Label noise is now a common problem in many applications, which
may lead to significant learning performance degeneration. To deal with the label
noise, Active Label Correction (ALC) was proposed to query the true labels for
a small subset of instances. As the true labels costs can be high, the focus of
ALC is to maximally improve the learning performance with minimal query costs.
Existing ALC methods mainly proceed by querying the most likely mislabeled
instances, or using criteria derived from standard active learning. In this paper,
we focus on deep neural network (dnn) models and show that due to their intrinsic
memorization effect, the true labels of a large proportion of mislabeled instances
can be correctly predicted with early stopped training, even under severe noise.
Inspired by this, we propose to train deep label noise learning models robustly with
dual ALC (DALC): on one hand, we select the most useful instances for classifier
improvement and query their true labels from external experts; on the other hand,
due to the active data sampling bias, the label noise model estimation can be highly biased, which may in turn hurt the classifier learning. To alleviate this issue, we propose to identify the instances that are most likely predicted with true labels
by the classifier, and take the predictions as their true labels. By integrating the
two sources of true labels, we experiment on multiple benchmark datasets with
various label noise rate and show the effectiveness of the proposed DALC on both
the classification accuracy and the label noise model estimation.