CMD: Controllable Matrix Decomposition with Global Optimization for Deep Neural Network Compression

Haonan Zhang (Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University); Longjun Liu (Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University)*; Hengyi Zhou (Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University); Hongbin Sun (Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University); Nanning Zheng (Xi'an Jiaotong University)

Abstract

The compression and acceleration of Deep neural networks (DNNs) are necessary steps to deploy sophisticated networks into resource-constrained hardware systems. Due to the weight matrix tends to be low-rank and sparse, several low-rank and sparse compression schemes are leveraged to reduce the overwhelmed weight parameters of DNNs. In these previous schemes, how to make the most of the low-rank and sparse components of weight matrices and how to globally decompose the weight matrix of different layers for efficient compression need to be further investigated. In this paper, in order to effectively utilize the low-rank and sparse characteristics of the weight matrix, we first introduce a sparse coefficient to dynamically control the allocation between the low-rank and sparse components, and an efficient reconstructed network is designed to reduce the inference time. Secondly, since the results of low-rank decomposition can affect the compression ratio and accuracy of DNNs, we establish an optimization problem to automatically select the optimal hyperparameters of the compressed network and achieve global compression for all the layers of network synchronously. Finally, to solve the optimization problem, we present a decomposition-searching algorithm to search the optimal solution. The algorithm can dynamically keep the balance between the compression ratio and accuracy. Extensive experiments of AlexNet, VGG-16 and ResNet-18 on CIFAR-10 and ImageNet are employed to evaluate the effectiveness of the proposed approach. After slight fine-tuning, compressed networks have gained 1.2X to 11.3X speedup and our method reduces the size of different networks by 1.4X to 14.6X.