Ensembling With a Fixed Parameter Budget: When Does It Help and Why?

Didan Deng (Hong Kong University of Science and Technology)*; Bertram E Shi (ECE Department of Hong Kong University of Science and Technology)
PMLR Page

Abstract

Given a fixed parameter budget, one can build a single large neural network or create a memory-split ensemble: a pool of several smaller networks with the same total parameter count as the single network. A memory-split ensemble can outperform its single model counterpart (Lobachevaet al. (2020)): a phenomenon known as the memory-split advantage (MSA). The reasons for MSA are still not yet fully understood. In particular, it is difficult in practice to predict when it will exist. This paper sheds light on the reasons underlying MSA using random feature theory. We study the dependence of the MSA on several factors: the parameter budget, the training set size, the L2 regularization and the SGD hyper-parameters, and how these factors interact. Using the bias-variance decomposition, we show that MSA exists when the reduction in variance due to the ensemble (i.e., ensemble gain) exceeds the increase in squared bias due to the smaller size of the individual networks (i.e., shrinkage cost). Taken together, our results demonstrate that the MSA exists mainly for the small parameter budgets relative to the training set size, and that memory-splitting can be understood as a type of regularization. Adding other forms of regularization,e.g.L2 regularization, reduces the MSA, so that there is no net gain. Thus, the potential benefit of memory-splitting lies primarily in the possibility of speed-up via parallel computation.