"Accelerating Bayesian Network Parameter Learning Using Hadoop and MapReduce" by Aniruddha Basak

Selected Works of Ole J Mengshoel

Follow Contact

Article

Accelerating Bayesian Network Parameter Learning Using Hadoop and MapReduce

Proc. of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining (BigMine’12) (2012)

Aniruddha Basak, Carnegie Mellon University
Irina Brinster, Carnegie Mellon University
Xianheng Ma, Carnegie Mellon University
Ole J Mengshoel, Carnegie Mellon University

Download

Abstract

Learning conditional probability tables of large Bayesian Networks (BNs) with hidden nodes using the Expectation Maximization algorithm is heavily computationally intensive. There are at least two bottlenecks, namely the potentially huge data set size and the requirement for computation and memory resources. This work applies the distributed computing framework MapReduce to Bayesian parameter learning from complete and incomplete data. We formulate both traditional parameter learning (complete data) and the classical Expectation Maximization algorithm (incomplete data) within the MapReduce framework. Analytically and experimentally we analyze the speed-up that can be obtained by means of MapReduce. We present the details of our Hadoop implementation, report speed-ups versus the sequential case, and compare various Hadoop configurations for experiments with Bayesian networks of different sizes and structures. For Bayesian networks with large junction trees, we surprisingly find that MapReduce can give a speed-up compared to the sequential Expectation Maximization algorithm for learning from 20 cases or fewer. The benefit of MapReduce for learning various Bayesian networks is investigated on data sets with up to 1,000,000 records.

Keywords

Bayesian networks,
MapReduce,
Hadoop,
Parameter Learning,
Semi-Supervised Learning

Disciplines

Publication Date

August 12, 2012

Publisher Statement
@inproceedings{basak12accelerating,
 author = {Basak, A. and Brinster, I. and Ma, X. and Mengshoel, O. J.},
 title = {Accelerating {Bayesian} Network Parameter Learning Using {Hadoop} and {MapReduce}},
 booktitle = {Proc. of BigMine-12},
 year = {2012},
 month  = {August},
 address = {Beijing, China}
}

Citation Information

Aniruddha Basak, Irina Brinster, Xianheng Ma and Ole J Mengshoel. "Accelerating Bayesian Network Parameter Learning Using Hadoop and MapReduce" Proc. of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining (BigMine’12) (2012)
Available at: http://works.bepress.com/ole_mengshoel/33/