Skip to main content
Article
Regression Approaches for Microarray Data Analysis
Biology Faculty Works
  • Mark R. Segal, University of California - San Francisco
  • Kam D Dahlquist, Loyola Marymount University
  • Bruce R Conklin, University of California - San Francisco
Document Type
Article
Publication Date
1-1-2003
Disciplines
Abstract
A variety of new procedures have been devised to handle the two sample comparison (e.g., tumor versus normal tissue) of gene expression values as measured with microarrays. Such new methods are required in part because of some defining characteristics of microarray-based studies: (i) the very large number of genes contributing expression measures which far exceeds the number of samples (observations) available, and (ii) the fact that by virtue of pathway/network relationships, the gene expression measures tend to be highly correlated. These concerns are exacerbated in the regression setting, where the objective is to relate gene expression, simultaneously for multiple genes, to some external outcome or phenotype. Correspondingly, several methods have been recently proposed for addressing these issues. We briefly critique some of these methods prior to a detailed evaluation of gene harvesting. This reveals that gene harvesting, without additional constraints, can yield artifactual solutions. Results obtained employing such constraints motivate the use of regularized regression procedures such as the lasso, least angle regression, and support vector machines. Model selection and solution multiplicity issues are also discussed. The methods are evaluated using a microarraybased study of cardiomyopathy in transgenic mice.
Citation Information

Segal, M.R., Dahlquist, K.D., & Conklin, B.R. (2003) Regression Approaches for Microarray Data Analysis. Journal of Computational Biology 10: 961-980.