"Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints" by Zhengqing Gao

Selected Works of Bin Gu

Article

Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints

International Conference on Information and Knowledge Management, Proceedings

Zhengqing Gao, Mohamed Bin Zayed University of Artificial Intelligence
Huimin Wu, Nanjing University of Information Science & Technology
Martin Takac, Mohamed bin Zayed University of Artificial Intelligence
Bin Gu, Mohamed Bin Zayed University of Artificial Intelligence

Link

Document Type

Conference Proceeding

Abstract

Semi-Supervised Support Vector Machine (S3VM) is one of the most popular methods for semi-supervised learning, which can make full use of plentiful, easily accessible unlabeled data. Balancing constraint is normally enforced in S3VM (denoted as BCS3VM) to avoid the harmful solution which assigns all or most of the unlabeled examples to one same label. Traditionally, non-linear BCS3VM is solved by sequential minimal optimization algorithm. Recently, a novel incremental learning algorithm (IL-BCS3VM) was proposed to scale up BCS3VM further. However, IL-BCS3VM needs to calculate the inverse of the linear system related to the support matrix, making the algorithm not scalable enough. To make BCS3VM be more practical in large-scale problems, in this paper, we propose a new scalable BCS3VM with accelerated triply stochastic gradients (denoted as TSG-BCS3VM). Specifically, to make the balancing constraint handle different proportions of positive and negative samples among labeled and unlabeled data, we propose a soft balancing constraint for S3VM. To make the algorithm scalable, we generate triply stochastic gradients by sampling labeled and unlabeled samples as well as the random features to update the solutions, where Quasi-Monte Carlo (QMC) sampling is utilized on random features to accelerate TSG-BCS3VM further. Our theoretical analysis shows that the convergence rate is O(1/gT) for both diminishing and constant learning rates where T is the number of iterations, which is much better than previous results thanks to the QMC method. Empirical results on a variety of benchmark datasets show that our algorithm not only has a good generalization performance but also enjoys better scalability than existing BCS3VM algorithms.

DOI

10.1145/3511808.3557150

Publication Date

10-17-2022

Keywords

balancing constraint,
semi-supervised support vector machine

Disciplines

Comments

IR conditions: non-described

Citation Information

Z. Gao, H. Wu, M. Takáč, and B. Gu, "Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints", In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM '22), Association for Computing Machinery, NY, pp. 3072–3081. Oct 2022. https://doi.org/10.1145/3511808.3557150