Accepted Paper: A New Multi-choice Reading Comprehension Dataset for Curriculum Learning

Session 2: Multi-task Learning, NLP, Computer Vision, Applications -- Day 2 (Nov.18), talks: 09:00-11:00 (5th floor Hall 2), poster session: 11:00-13:30
Poster number: Mon26
Download paper

Authors

Yichan Liang (Sun Yat-sen University, China); Jianheng Li (Sun Yat-sen University, China); Jian Yin (Sun Yat-Sen University)

Abstract

The past few years have witnessed the rapid development of machine reading comprehension (MRC), especially the challenging sub-task, multiple-choice reading comprehension (MCRC). And the release of large scale datasets promotes the research in this ﬁeld. Yet previous methods have already achieved high accuracies of the MCRC datasets, e.g. RACE. It’s necessary to propose a more diﬃcult dataset which needs more reasoning and inference for evaluating the understanding capability of new methods. To respond to such demand, we present RACE-C, a new multi-choice reading comprehension dataset collected from college English examinations in China. And further we integrate it with RACE-M and RACE-H, collected by Lai et al. (2017) from middle and high school exams respectively, to extend RACE to be RACE++. Based on RACE++, we propose a three-stage curriculum learning framework, which is able to use the best of the characteristic that the diﬃculty level within these three sub-datasets is in ascending order. As we expected, statistics show the higher diﬃculty level of our collected dataset, RACE-C, compared to RACE’s two sub-datasets, i.e., RACE-M and RACE-H. And experimental results demonstrate that our proposed three-stage curriculum learning approach improves the performance of the machine reading comprehension model to an extent.