Skip to main content
Article
A Bayesian beta kernel model for binary classification and online learning problems
Statistical Analysis and Data Mining (2014)
  • Cameron A. MacKenzie
  • Theodore B. Trafalis, University of Oklahoma Norman Campus
  • Kash Barker, University of Oklahoma Norman Campus
Abstract
Recent advances in data mining have integrated kernel functions with Bayesian probabilistic analysis of Gaussian distributions. These machine-learning approaches can incorporate prior information with new data to calculate probabilistic rather than deterministic values for unknown parameters. This article extensively analyzes a specific Bayesian kernel model that uses a kernel function to calculate a posterior beta distribution that is conjugate to the prior beta distribution. Numerical testing of the beta kernel model on several benchmark datasets reveals that this model's accuracy is comparable with those of the support vector machine (SVM), relevance vector machine, naive Bayes, and logistic regression, and the model runs more quickly than all the other algorithms except for logistic regression. When one class occurs much more frequently than the other class, the beta kernel model often outperforms other strategies to handle imbalanced datasets, including under-sampling, over-sampling, and the Synthetic Minority Over-Sampling Technique. If data arrive sequentially over time, the beta kernel model easily and quickly updates the probability distribution, and this model is more accurate than an incremental SVM algorithm for online learning.
Keywords
  • data mining,
  • kernel,
  • Bayesian,
  • beta distribution,
  • online learning
Publication Date
December, 2014
Publisher Statement
This is the peer reviewed version of the following article: MacKenzie, C. A., Trafalis, T. B. and Barker, K. (2014), A Bayesian beta kernel model for binary classification and online learning problems.Statistical Analy Data Mining, 7: 434–449, which has been published in final form at doi: 10.1002/sam.11241. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for self-archiving.
Citation Information
Cameron A. MacKenzie, Theodore B. Trafalis and Kash Barker. "A Bayesian beta kernel model for binary classification and online learning problems" Statistical Analysis and Data Mining Vol. 7 Iss. 6 (2014)
Available at: http://works.bepress.com/cameron_mackenzie/7/