Improving Hashing Algorithms for Similarity Search via MLE and the Control Variates Trick

Keegan Kang (SUTD)*; Sergey Kushnarev (JHU); Weipin Wong (SUTD); Rameshwar Pratap Yadav (IIT Mandi); Haikal Yeo (Ministry of Education); Yijia Chen (Singapore University of Technology and Design)


Hashing algorithms are continually used for large-scale learning and similarity search, with computationally cheap and more accurate algorithms being proposed every year. In this paper we focus on hashing algorithms which involve estimating a distance measure d(\vec{x}_i, \vec{x}_j) between two vectors \vec{x}_i, \vec{x}_j. Such hashing algorithms require generation of random variables, and we propose two approaches to reduce the variance of our hashed estimates: control variates and maximum likelihood estimates. We explain how these approaches can be immediately applied to a wide subset of hashing algorithms. Further, we evaluate the impact of these methods on various datasets. We finally run empirical simulations to verify our results.