Mining large data requires intensive computing resources and data mining expertise, which might not be available for many users. With the development of cloud computing and services computing, data mining tasks can now be moved to the cloud or outsourced to third parties to save costs. In this new paradigm, data and model confidentiality becomes the major concern to the data owner. Meanwhile, users are also concerned about the potential tradeoff among costs, model quality, and confidentiality. In this paper, we propose the PerturBoost framework to address the problems in confidential cloud or outsourced learning. PerturBoost combined with the random space perturbation (RASP) method that was also developed by us can effectively protect data confidentiality, model confidentiality, and model quality with low client-side costs. Based on the boosting framework, we develop a number of base learner algorithms that can learn linear classifiers from the RASP-perturbed data. This approach has been evaluated with public datasets. The result shows that the RASP-based PerturBoost can provide model accuracy very close to the classifiers trained with the original data and the AdaBoost method, with high confidentiality guarantee and acceptable costs.
Available at: http://works.bepress.com/keke_chen/45/