The Flowing Nature Matters: Feature Learning from the Control Flow Graph of Source Code for Bug Localization

Yi-Fan Ma (Nanjing University)*; Ming Li (Nanjing University)

Abstract

Bug localization plays an important role in software maintenance. Traditional works treat the source code from the lexical perspective, while some recent researches indicate that exploiting the program structure is beneficial for improving bug localization. Control flow graph (CFG) is a widely used graph representation, which essentially represents the program structure. Although using graph neural network for feature learning is a straightforward way and has been proven effective in various software mining problems, this approach is inappropriate due to the assumption that adjacent nodes share similar semantics no longer holds in the CFG. Instead, the previous statements may affect the semantics of subsequent statements along the execution path represented by the CFG, which we call the \textit{flowing} nature of control flow graph. In this paper, we claim that the flowing nature should be explicitly considered and propose a novel model named cFlow for bug localization, which employs a particular designed flow-based GRU for feature learning from the CFG. The flow-based GRU exploits the program structure represented by the CFG to transmit the semantics of statements along the execution path, which reflects the \textit{flowing nature}. Experimental results on widely-used real-world software projects show that cFlow significantly outperforms the state-of-the-art bug localization methods, indicating that exploiting the program structure from the CFG with respect to the flowing nature is beneficial for improving bug localization.