- Session 3: Supervised and General Machine Learning -- Day 3 (Nov.19), talks: 10:50-11:30 (5th floor Hall 1), poster session: 11:30-14:00
- Poster number: Tue09
- Download paper
Amina Mollaysa (university of geneva); Alexandros Kalousis (AU Geneva); Eric Bruno (Expedia); Maurits Diephuis (University of Geneva)
Neural networks typically need huge amounts of training data in order to get reasonable generalizable results. A common approach is to artificially generate additional samples by leveraging prior knowledge of the data properties or other relevant domain knowledge. Data augmentation does critically hinge on the assumption that the new samples come from the same, be it unknown, distribution. If not, such augmentation will degenerate model performance. For images, this amounts to introducing small distortions to existing training samples. For the other data modalities, this remains an open problem. This work introduces a critical data augmentation method using so-called feature side-information. This is meta information on features’ intrinsic properties that are otherwise excluded from training. The main contribution is the introduction of an instance wise quality checking the procedure of augmented data. It filters out harmful or irrelevant augmentations prior to entering the model. We validated our approach on both synthetic and real-world datasets, where the data augmentation is done based on a task independent, unreliable source of information. The experiments show that the introduced critical data augmentation scheme helps avoid performance degeneration resulting from incorporating wrong augmented data.