Skip to main content
Presentation
Case-Specific Random Forests for Big Data Prediction
JSM Proceedings
  • Joshua Zimmerman, Iowa State University
  • Dan Nettleton, Iowa State University
Document Type
Conference Proceeding
Conference
2015 Joint Statistical Meetings
Publication Version
Published Version
Publication Date
1-1-2015
Conference Title
2015 Joint Statistical Meetings
Conference Date
August 8-13, 2015
Geolocation
(47.6062095, -122.3320708)
Abstract

Some training datasets may be too large for storage on a single computer. Such datasets may be partitioned and stored on separate computers connected in a parallel computing environment. To predict the response associated with a specific target case when training data are partitioned, we propose a method for finding the training cases within each partition that are most relevant for predicting the response of a target case of interest. These most relevant training cases from each partition can be combined into a single dataset, which can be a subset of the entire training dataset that is small enough for storage and analysis in memory on a single computer. To generate a prediction from this selected subset, we use Case-Specific Random Forests, a variation of random forests that replaces the uniform bootstrap sampling used to build a tree in a random forest with unequal weighted bootstrap sampling, where training cases more similar to the target case are given greater weight. We demonstrate our method with an example concrete dataset. Our results show that predictions generated from a small selected subset of a partitioned training dataset can be as accurate as predictions generated in a traditional manner from the entire training dataset.

Comments

This proceeding is published as Zimmerman, J., Nettleton, D. (2015). Case-specific random forests for big data prediction. In JSM Proceedings, General Methodology. Alexandria, VA: American Statistical Association, pp. 2537–2543. Posted with permission.

Copyright Owner
American Statistical Association
Language
en
File Format
application/pdf
Citation Information
Joshua Zimmerman and Dan Nettleton. "Case-Specific Random Forests for Big Data Prediction" Seattle, WashingtonJSM Proceedings (2015) p. 2537 - 2543
Available at: http://works.bepress.com/dan-nettleton/128/