High-throughput cancer studies have been extensively conducted, searching for genetic risk factors independently associated with prognosis beyond clinical and environmental risk factors. Many studies have shown that the gene-environment interactions may have important implications. Some of the existing methods, such as the commonly adopted single-marker analysis, may be limited in that they cannot accommodate the joint effects of a large number of genetic markers or use ineffective marker identification techniques. In this study, we analyze cancer prognosis studies, and adopt the AFT (accelerated failure time) model to describe survival. A weighted least squares approach, which has the lowest computational cost, is adopted for estimation. For the identification of G*E interactions and main effects, we adopt a group sparse penalization approach, which has an intuitive formulation, can accommodate the joint effects of a large number of markers, and is computationally affordable. Simulation study shows satisfactory performance of the penalization approach. Analysis of an NHL (non-Hodgkin lymphoma) prognosis study with SNP measurements shows that the proposed approach may identify markers with important implications and satisfactory prediction performance and reproducibility. Analysis of a follicular lymphoma gene expression study is also conducted.
- Gene-environment interaction; Cancer Prognosis; Marker selection; Penalization.
Available at: http://works.bepress.com/shuangge/42/