Skip to main content
Article
Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints
International Conference on Information and Knowledge Management, Proceedings
  • Zhengqing Gao, Mohamed Bin Zayed University of Artificial Intelligence
  • Huimin Wu, Nanjing University of Information Science & Technology
  • Martin Takac, Mohamed bin Zayed University of Artificial Intelligence
  • Bin Gu, Mohamed Bin Zayed University of Artificial Intelligence
Document Type
Conference Proceeding
Abstract

Semi-Supervised Support Vector Machine (S3VM) is one of the most popular methods for semi-supervised learning, which can make full use of plentiful, easily accessible unlabeled data. Balancing constraint is normally enforced in S3VM (denoted as BCS3VM) to avoid the harmful solution which assigns all or most of the unlabeled examples to one same label. Traditionally, non-linear BCS3VM is solved by sequential minimal optimization algorithm. Recently, a novel incremental learning algorithm (IL-BCS3VM) was proposed to scale up BCS3VM further. However, IL-BCS3VM needs to calculate the inverse of the linear system related to the support matrix, making the algorithm not scalable enough. To make BCS3VM be more practical in large-scale problems, in this paper, we propose a new scalable BCS3VM with accelerated triply stochastic gradients (denoted as TSG-BCS3VM). Specifically, to make the balancing constraint handle different proportions of positive and negative samples among labeled and unlabeled data, we propose a soft balancing constraint for S3VM. To make the algorithm scalable, we generate triply stochastic gradients by sampling labeled and unlabeled samples as well as the random features to update the solutions, where Quasi-Monte Carlo (QMC) sampling is utilized on random features to accelerate TSG-BCS3VM further. Our theoretical analysis shows that the convergence rate is O(1/gT) for both diminishing and constant learning rates where T is the number of iterations, which is much better than previous results thanks to the QMC method. Empirical results on a variety of benchmark datasets show that our algorithm not only has a good generalization performance but also enjoys better scalability than existing BCS3VM algorithms.

DOI
10.1145/3511808.3557150
Publication Date
10-17-2022
Keywords
  • balancing constraint,
  • semi-supervised support vector machine
Comments

IR conditions: non-described

Citation Information
Z. Gao, H. Wu, M. Takáč, and B. Gu, "Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints", In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM '22), Association for Computing Machinery, NY, pp. 3072–3081. Oct 2022. https://doi.org/10.1145/3511808.3557150