"Case-Specific Random Forests for Big Data Prediction" by Joshua Zimmerman

Selected Works of Dan Nettleton

Follow Contact

Presentation

Case-Specific Random Forests for Big Data Prediction

JSM Proceedings

Joshua Zimmerman, Iowa State University
Dan Nettleton, Iowa State University

Download

Document Type

Conference Proceeding

Disciplines

Conference

2015 Joint Statistical Meetings

Publication Version

Published Version

Publication Date

1-1-2015

Conference Title

2015 Joint Statistical Meetings

Conference Date

August 8-13, 2015

Geolocation

(47.6062095, -122.3320708)

Abstract

Some training datasets may be too large for storage on a single computer. Such datasets may be partitioned and stored on separate computers connected in a parallel computing environment. To predict the response associated with a specific target case when training data are partitioned, we propose a method for finding the training cases within each partition that are most relevant for predicting the response of a target case of interest. These most relevant training cases from each partition can be combined into a single dataset, which can be a subset of the entire training dataset that is small enough for storage and analysis in memory on a single computer. To generate a prediction from this selected subset, we use Case-Specific Random Forests, a variation of random forests that replaces the uniform bootstrap sampling used to build a tree in a random forest with unequal weighted bootstrap sampling, where training cases more similar to the target case are given greater weight. We demonstrate our method with an example concrete dataset. Our results show that predictions generated from a small selected subset of a partitioned training dataset can be as accurate as predictions generated in a traditional manner from the entire training dataset.

Comments

This proceeding is published as Zimmerman, J., Nettleton, D. (2015). Case-specific random forests for big data prediction. In JSM Proceedings, General Methodology. Alexandria, VA: American Statistical Association, pp. 2537–2543. Posted with permission.

American Statistical Association

2015

Language

File Format

application/pdf

Citation Information

Joshua Zimmerman and Dan Nettleton. "Case-Specific Random Forests for Big Data Prediction" Seattle, WashingtonJSM Proceedings (2015) p. 2537 - 2543
Available at: http://works.bepress.com/dan-nettleton/128/