Skip to main content
Article
A Comprehensive Cluster and Classification Mining Procedure for Daily Stock Market Return Forecasting
Neurocomputing
  • Xiao Zhong
  • David Lee Enke, Missouri University of Science and Technology
Abstract

Data mining and big data analytic techniques are playing an important role in many application fields, including the financial markets. However, only few studies have focused on predicting daily stock market returns, and among these studies, the data mining procedures utilized are either incomplete or inefficient. This paper presents a comprehensive data mining process to forecast the daily direction of the S&P 500 Index ETF (SPY) return based on 60 financial and economical features. The fuzzy c-means method (FCM) is initially used to cluster the preprocessed data. A principal component analysis (PCA) is applied next to the entire data set and each of seven clusters. The dimension of the entire cleaned data set is then reduced according to the combining results from the entire data set and each cluster. Corresponding to different levels of the dimensionality reduction, twelve new data sets are generated from the entire cleaned data. Artificial neural networks (ANNs) and logistic regression models are then used with the twelve transformed data sets for classification in order to forecast the daily direction of future market returns and indicate the efficiency of dimensionality reduction with PCA. A group of hypothesis tests are performed over the classification and simulation results to show that the ANNs give significantly higher classification accuracy than logistic regression, and that the trading strategies guided by the comprehensive cluster and classification mining procedure based on PCA and ANNs gain higher risk-adjusted profits than the comparison benchmarks, as well as those strategies guided by the forecasts based on PCA and logistic regression models.

Department(s)
Engineering Management and Systems Engineering
Research Center/Lab(s)
Intelligent Systems Center
Keywords and Phrases
  • Classification (of information),
  • Commerce,
  • Data mining,
  • Economic analysis,
  • Electronic trading,
  • Finance,
  • Financial data processing,
  • Financial markets,
  • Forecasting,
  • Fuzzy neural networks,
  • Fuzzy systems,
  • Investments,
  • Neural networks,
  • Principal component analysis,
  • Reduction,
  • Regression analysis,
  • Classification accuracy,
  • Classification mining,
  • Dimensionality reduction,
  • Fuzzy C mean,
  • Fuzzy c-means methods,
  • Logistic regression models,
  • Logistic regressions,
  • Stock return forecasting,
  • Big data,
  • Accuracy,
  • Article,
  • Artificial neural networks (ANNs),
  • Benchmarking,
  • Cluster analysis,
  • Financial information system,
  • Financial management,
  • Fuzzy c means method,
  • Logistic regression analysis,
  • Principal component analysis (PCA),
  • Priority journal,
  • Process optimization,
  • Simulation,
  • Stock market return,
  • Daily stock return forecasting,
  • Fuzzy c-means (FCM)
Document Type
Article - Journal
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2017 Elsevier, All rights reserved.
Publication Date
12-1-2017
Publication Date
01 Dec 2017
Citation Information
Xiao Zhong and David Lee Enke. "A Comprehensive Cluster and Classification Mining Procedure for Daily Stock Market Return Forecasting" Neurocomputing Vol. 267 (2017) p. 152 - 168 ISSN: 0925-2312
Available at: http://works.bepress.com/david-enke/41/