Skip to main content
Article
Multiple Imputation and Random Forests (MIRF) for Unobservable, High-Dimensional Data
The International Journal of Biostatistics (2012)
  • Bareng A. S. Nonyane, university of massachusetts, Amherst
  • Andrea S. Foulkes, university of massachusetts, Amherst
Abstract

Understanding the genetic underpinnings to complex diseases requires consideration of sophisticated analytical methods designed to uncover intricate associations across multiple predictor variables. At the same time, knowledge of whether single nucleotide polymorphisms within a gene are on the same (in cis) or on different (in trans) chromosomal copies, may provide crucial information about measures of disease progression. In association studies of unrelated individuals, allelic phase is generally unobservable, generating an additional analytical challenge. In this manuscript, we describe a novel approach that combines multiple imputation and random forests for this high-dimensional, unobservable data setting. An application to a cohort of HIV-1 infected individuals receiving anti-retroviral therapies is presented. A simulation study is also presented to characterize method performance.

Keywords
  • recursive partitioning,
  • random forests,
  • haplotype,
  • genotype,
  • phase,
  • HIV-1,
  • lipids
Publication Date
January 6, 2012
Citation Information
Bareng A. S. Nonyane and Andrea S. Foulkes. "Multiple Imputation and Random Forests (MIRF) for Unobservable, High-Dimensional Data" The International Journal of Biostatistics Vol. 3 Iss. 1 (2012)
Available at: http://works.bepress.com/andrea_foulkes/2/