ExNN-SMOTE: Extended Natural Neighbors Based SMOTE to Deal with Imbalanced Data

Hongjiao Guan (Qilu University of Technology (Shandong Academy of Sciences))*; Yingtao Zhang (Harbin Institute of Technology); Xianglong Tang (Harbin Institute of Technology); Bin Ma (Qilu University of Technology)
PMLR Page

Abstract

Many practical applications suffer from the problem of imbalanced classification. The minority class has poor classification performance; on the other hand, its misclassification cost is high. One reason for classification difficulty is the intrinsic complicated distribution characteristics (CDCs) in imbalanced data itself. Classical oversampling method SMOTE generates synthetic minority class examples between neighbors, which is parameter dependent. Furthermore, due to blindness of neighbor selection, SMOTE suffers from overgeneralization in the minority class. To solve such problems, we propose an oversampling method, called extended natural neighbors based SMOTE (ExNN-SMOTE). In ExNN-SMOTE, neighbors are determined adaptively by capturing data distribution characteristics. Extensive experiments over synthetic and real datasets demonstrate the effectiveness of ExNN-SMOTE dealing with CDCs and the superiority of ExNN-SMOTE over other SMOTE-related methods.