Accepted Papers (Journal Track)

Efficient Preconditioning for Noisy Separable NMFs by Successive Projection Based Low-Rank Approximations - Tomohiko Mizutani*, Tokyo Institute of Technology; Mirai Tanaka, The Institute of Statistical Mathematics [abstract]
The successive projection algorithm (SPA) can quickly solve a nonnegative matrix factorization problem under a separability assumption. Even if noise is added to the problem, SPA is robust as long as the perturbations caused by the noise are small. In particular, robustness against noise should be high when handling the problems arising from real applications. The preconditioner proposed by Gillis and Vavasis (2015) makes it possible to enhance the noise robustness of SPA. Meanwhile, an additional computational cost is required. The construction of the preconditioner contains a step to compute the top-$k$ truncated singular value decomposition of an input matrix. It is known that the decomposition provides the best rank-$k$ approximation to the input matrix; in other words, a matrix with the smallest approximation error among all matrices of rank less than $k$. This step is an obstacle to an efficient implementation of the preconditioned SPA. To address the cost issue, we propose a modification of the algorithm for constructing the preconditioner. Although the original algorithm uses the best rank-$k$ approximation, instead of it, our modification uses an alternative. Ideally, this alternative should have high approximation accuracy and low computational cost. To ensure this, our modification employs a rank-$k$ approximation produced by an SPA based algorithm. We analyze the accuracy of the approximation and evaluate the computational cost of the algorithm. We then present an empirical study revealing the actual performance of the SPA.
Distributed Multi-task Classification: A Decentralized Online Learning Approach - Chi Zhang*, Nanyang Technological University; Peilin Zhao, Ant Financial; Shuji Hao, A*STAR; Yeng Chai Soh, Nanyang Technological University; Bu Sung Lee, Nanyang Technological University; Steven C.H. Hoi, Singapore Management University [abstract]
Although dispersing one single task (STL) to distributed learning nodes has been intensively studied by the previous research, multi-task learning (MTL) on distributed networks is still an area that has not been fully exploited, especially under decentralized settings. The challenge lies in the fact that different tasks may have different optimal learning weights while communication through the distributed network forces all tasks to converge to an unique classifier. In this paper, we present a novel algorithm to overcome this challenge and enable learning multiple tasks simultaneously on a decentralized distributed network. Specifically, the learning framework can be separated into two phases: (i) multi-task information is shared within each node on the first phase; (ii) communication between nodes then leads the whole network to converge to a common minimizer. Theoretical analysis indicates that our algorithm achieves a O(\sqrt(T)) regret bound when compared with the best classifier in hindsight, which is further validated by experiments on both synthetic and real-world datasets.
Learning Safe Multi-Label Prediction for Weakly Labeled Data - Tong Wei, Nanjing University; Lan-Zhe Guo, Nanjing University; Yu-Feng Li*, Nanjing University; Wei Gao, Nanjing University [abstract]
In this paper we study multi-label learning with weakly labeled data, i.e., labels of training examples are incomplete, which commonly occurs in real applications, e.g., image classification, document categorization. This setting includes, e.g., (i) \emph{semi-supervised multi-label learning} where completely labeled examples are partially known; (ii) \emph{weak label learning} where relevant labels of examples are partially known; iii) \emph{extended weak label learning} where relevant and irrelevant labels of examples are partially known. Previous studies often expect that the learning method with the use of weakly labeled data will improve the performance, as more data are employed. This, however, is not always the cases in reality, i.e., weakly labeled data may sometimes degenerate the learning performance. It is desirable to learn \emph{safe} multi-label prediction that will not hurt performance when weakly labelled data is involved in the learning procedure. In this work we optimize multi-label evaluation metrics (F$_1$ score and Top-$k$ precision) given that the ground-truth label assignment is realized by a convex combination of base multi-label learners. To cope with the infinite number of possible ground-truth label assignments, cutting-plane strategy is adopted to iteratively generate the most helpful label assignments. The whole optimization is cast as a series of simple linear programs in an efficient manner. Extensive experiments on three weakly labeled learning tasks, namely, i) semisupervised multi-label learning; ii) weak-label learning and iii) extended weak-label learning, clearly show that our proposal improves the safeness of using weakly labelled data compared with many state-of-the-art methods.
Crowdsourcing with Unsure Option - Yao-Xiang Ding, Nanjing University; Zhi-Hua Zhou*, Nanjing University [abstract]
One of the fundamental problems in crowdsourcing is the trade-off between the number of the workers needed for high-accuracy aggregation and the budget to pay. For saving budget, it is important to ensure high quality of the crowd-sourced labels, hence the total cost on label collection will be reduced. Since the self-confidence of the workers often has a close relationship with their abilities, a possible way for quality control is to request the workers to return the labels only when they feel confident, by means of providing unsure option to them. On the other hand, allowing workers to choose unsure option also leads to the potential danger of budget waste. In this work, we propose the analysis towards understanding when providing the unsure option indeed leads to significant cost reduction, as well as how the confidence threshold is set. We also propose an online mechanism, which is alternative for threshold selection when the estimation of the crowd ability distribution is difficult.
Semi-Supervised AUC Optimization based on Positive-Unlabeled Learning - Tomoya Sakai*, The University of Tokyo / RIKEN; Gang Niu, The University of Tokyo / RIKEN; Masashi Sugiyama, RIKEN / The University of Tokyo [abstract]
Maximizing the area under the receiver operating characteristic curve (AUC) is a standard approach to imbalanced classification. So far, various supervised AUC optimization methods have been developed and they are also extended to semisupervised scenarios to cope with small sample problems. However, existing semisupervised AUC optimization methods rely on strong distributional assumptions, which are rarely satisfied in real-world problems. In this paper, we propose a novel semisupervised AUC optimization method that does not require such restrictive assumptions. We first develop an AUC optimization method based only on positive and unlabeled data (PU-AUC) and then extend it to semi-supervised learning by combining it with a supervised AUC optimization method. We theoretically prove that, without the restrictive distributional assumptions, unlabeled data contribute to improving the generalization performance in PU and semi-supervised AUC optimization methods. Finally, we demonstrate the practical usefulness of the proposed methods through experiments.
Robust Plackett-Luce Model for k-ary Crowdsourced Preferences - Bo Han*, University of Technology Sydney; Yuangang Pan, University of Technology Sydney; Ivor W. Tsang, University of Technology Sydney [abstract]
The aggregation of $k$-ary preferences is an emerging ranking problem, which plays an important role in several aspects of our daily life, such as ordinal peer grading and online product recommendation. At the same time, crowdsourcing has become a trendy way to provide a plethora of $k$-ary preferences for this ranking problem, due to convenient platforms and low costs. However, $k$-ary preferences from crowdsourced workers are often noisy, which inevitably degenerates the performance of traditional aggregation models. To address this challenge, in this paper, we present a RObust PlAckett-Luce (ROPAL) model. Specifically, to ensure the robustness, ROPAL integrates the Plackett-Luce model with a denoising vector. Based on the Kendall-tau distance, this vector corrects $k$-ary crowdsourced preferences with a certain probability. In addition, we propose an online Bayesian inference to make ROPAL scalable to large-scale preferences. We conduct comprehensive experiments on simulated and real-world datasets. Empirical results on ``massive synthetic'' and ``real-world'' datasets show that ROPAL with online Bayesian inference achieves substantial improvements in robustness and noisy worker detection over current approaches.