AARM: Action Attention Recalibration Module for Action Recognition
By Li Zhonghong, Yi Yang, She Ying, Song Jialun, and Wu Yukun
Most of Action recognition methods deploy networks pretrained on image datasets, and a common limitation is that these networks hardly capture salient features of the video clip due to their training strategies. To address this issue, we propose Action Attention Recalibration Module (AARM), a lightweight but effective module which introduces the attention mechanism to process feature maps of the network. The proposed module is composed of two novel components: 1) convolutional attention submodule that obtains inter-channel attention maps and spatial-temporal attention maps during the convolutional stage, and 2) activation attention submodule that highlights the significant activations in the fully connected process. Based on ablation studies and extensive experiments, we demonstrate that AARM enables networks to be sensitive on informative parts and gain accuracy increasements, achieving the state-of-the-art performance on UCF101 and HMDB51.