"Application of two machine learning algorithms to genetic association studies in the presence of covariates" by Bareng AS Nonyane

Selected Works of Andrea S Foulkes

Follow Contact

Article

Application of two machine learning algorithms to genetic association studies in the presence of covariates

BMC Genetics (2008)

Bareng AS Nonyane, University of Massachusetts - Amherst
Andrea S Foulkes, University of Massachusetts - Amherst

Download Find in your library

Abstract

Background - Population-based investigations aimed at uncovering genotype-trait associations often involve high-dimensional genetic polymorphism data as well as information on multiple environmental and clinical parameters. Machine learning (ML) algorithms offer a straightforward analytic approach for selecting subsets of these inputs that are most predictive of a pre-defined trait. The performance of these algorithms, however, in the presence of covariates is not well characterized. Methods and Results - In this manuscript, we investigate two approaches: Random Forests (RFs) and Multivariate Adaptive Regression Splines (MARS). Through multiple simulation studies, the performance under several underlying models is evaluated. An application to a cohort of HIV-1 infected individuals receiving anti-retroviral therapies is also provided. Conclusion - Consistent with more traditional regression modeling theory, our findings highlight the importance of considering the nature of underlying gene-covariate-trait relationships before applying ML algorithms, particularly when there is potential confounding or effect mediation.

Disciplines

Public Health

Publication Date

November 14, 2008

Publisher Statement

This document was harvested from BioMed Central. doi:10.1186/1471-2156-9-71

Citation Information

Bareng AS Nonyane and Andrea S Foulkes. "Application of two machine learning algorithms to genetic association studies in the presence of covariates" BMC Genetics Vol. 9 (2008)
Available at: http://works.bepress.com/andrea_foulkes/16/