Kernel methods have achieved tremendous success in the past two decades. In the current big data era, data collection has grown tremendously. However, existing kernel methods are not scalable enough both at the training and predicting steps. To address this challenge, in this paper, we first introduce a general sparse kernel learning formulation based on the random feature approximation, where the loss functions are possibly non-convex. In order to reduce the scale of random features required in experiment, we also use that formulation based on the orthogonal random feature approximation. Then we propose a new asynchronous parallel doubly stochastic algorithm for large scale sparse kernel learning (AsyDSSKL). To the best our knowledge, AsyDSSKL is the first algorithm with the techniques of asynchronous parallel computation and doubly stochastic optimization. We also provide a comprehensive convergence guarantee to AsyDSSKL. Importantly, the experimental results on various large-scale real-world datasets show that, our AsyDSSKL method has the significant superiority on the computational efficiency at the training and predicting steps over the existing kernel methods.
- asynchronous parallel computation,
- coordinate descent,
- Kernel method,
- random feature,
- stochastic gradient descent