Identifying important explanatory variables for time-varying outcomes.
Abstract
This chapter describes a systematic and targeted approach for estimating the impact of each of a large number of baseline covariates on an outcome that is measured repeatedly over time. These variable importance estimates can be adjusted for a user-specified set of confounders and lend themselves in a straightforward way to obtaining confidence intervals and p-values. Hence, they can in particular be used to identify a subset of baseline covariates that are the most important explanatory variables for the time-varying outcome of interest. We illustrate the methodology in a data analysis aimed at finding mutations of the human immunodeficiency virus that predict how well a patient responds to a drug regimen containing the two antiretroviral drugs lamivudine and stavudine. The most significant mutation we identify, 184IV, has previously been characterized as conferring high-level resistance to lamivudine. Our analysis furthermore points to a second mutation, 75AIMTS, that has been linked to moderate resistance to both lamivudine and stavudine.
Suggested Citation
Oliver Bembom, Maya L. Petersen, and Mark J. van der Laan. "Identifying important explanatory variables for time-varying outcomes." Fundamentals of Data Mining in Genomics and Proteomics. Ed. W. Dubitzky, M. Granzow, and D.P. Berrar. Springer, 2006. 227-250.
Available at: http://works.bepress.com/mark_van_der_laan/192