Accepted Paper: A New Multi-choice Reading Comprehension Dataset for Curriculum Learning

Back to list of accepted papers


Yichan Liang (Sun Yat-sen University, China); Jianheng Li (Sun Yat-sen University, China); Jian Yin (Sun Yat-Sen University)


The past few years have witnessed the rapid development of machine reading comprehension (MRC), especially the challenging sub-task, multiple-choice reading comprehension (MCRC). And the release of large scale datasets promotes the research in this field. Yet previous methods have already achieved high accuracies of the MCRC datasets, e.g. RACE. It’s necessary to propose a more difficult dataset which needs more reasoning and inference for evaluating the understanding capability of new methods. To respond to such demand, we present RACE-C, a new multi-choice reading comprehension dataset collected from college English examinations in China. And further we integrate it with RACE-M and RACE-H, collected by Lai et al. (2017) from middle and high school exams respectively, to extend RACE to be RACE++. Based on RACE++, we propose a three-stage curriculum learning framework, which is able to use the best of the characteristic that the difficulty level within these three sub-datasets is in ascending order. As we expected, statistics show the higher difficulty level of our collected dataset, RACE-C, compared to RACE’s two sub-datasets, i.e., RACE-M and RACE-H. And experimental results demonstrate that our proposed three-stage curriculum learning approach improves the performance of the machine reading comprehension model to an extent.