Skip to main content
Article
Data sampling selection issues for bankruptcy prediction
Risk, Hazards & Crisis in Public Policy (2015)
  • Yan Yu
  • Shaonan Tian
  • Ming Zhou, San José State University
Abstract
Bankruptcy prediction is of paramount interest to both academics and practitioners. This paper devotes special care to an important aspect of the bankruptcy prediction modeling: Data sample selection issue. To investigate the effect of the different data selection methods, three models are adopted: Logistic regression model, Neural Networks (NNET), and Support Vector Machines (SVM), which have recently gained some popularity in the applications. A Monte Carlo simulation study and an empirical analysis on an updated bankruptcy database are conducted to explore the effect of different data sample selection methods. By comparing the out‐of‐sample predictive performances, we conclude that if forecasting the probability of bankruptcy is of interest, complete data sampling technique provides more accurate results. However, if a binary bankruptcy decision or classification is desired, choice based sampling technique may still be suitable. In particular, choice‐based data samples validated by NNET and SVM can capture more correct predictions of bankruptcy observations, and provide lower asymmetric misclassification rate. In addition, for different choice‐based data samples, it is essential to adjust the cut‐off probability. An appropriate choice of cut‐off probability depends on the specification of the cost ratio between the Type I error and Type II error. The proposed optimal cut‐off probability in this work is a function of the data sample selection methods and the cost ratio.
Disciplines
Publication Date
August 20, 2015
DOI
10.1002/rhc3.12071
Citation Information
Yan Yu, Shaonan Tian and Ming Zhou. "Data sampling selection issues for bankruptcy prediction" Risk, Hazards & Crisis in Public Policy Vol. 6 Iss. 1 (2015)
Available at: http://works.bepress.com/ming_zhou/32/