Bridging Code-Text Representation Gap using Explanation

Hojae Han (Seoul National University)*; Youngwon Lee (Seoul National University); Minsoo Kim (Yonsei University); Seung-won Hwang (Seoul National University)


This paper studies Code-Text Representation (CTR) learning, aiming to learn general- purpose representations that support downstream code/text applications such as code search, finding code matching textual queries. However, state-of-the-arts do not focus on matching the gap between code/text modalities. In this paper, we complement this gap by providing an intermediate representation, and view it as “explanation”, to get inspired from existing two types, adopt each in CTR, and compare the effectiveness among them. Our contribution is three fold: First, we propose four types of explanation utilization methods for CTR, and analyze their effectiveness. Second, we showed that using explanation as the model input is desirable. Third, we confirmed that even automatically generated explana- tion can lead to a drastic performance gain. To our best knowledge, this is the first work to define and categorize code explanation, for enhancing code understanding/representation.