Skip to main content
Article
Statistical Issues in the Clustering of Gene Expression Data
Statistica Sinica (2002)
  • Darlene R. Goldstein
  • Debashis Ghosh
  • Erin M. Conlon, University of Massachusetts - Amherst
Abstract

This paper illustrates some of the problems which can occur in any data set when clustering samples of gene expression profiles. These include a possible high degree of dependence of results on choice of clustering algorithm, further dependence of results on the choices of genes and samples to be included in the clustering (for example, whether or not to include control samples), and difficulty in assessing the validity of the grouping. We also demonstrate the use of Cox regression as a tool to identify genes influencing survival.

Keywords
  • Cluster analysis,
  • Cox regression,
  • microarray experiment,
  • survival analysis,
  • unsupervised learning
Disciplines
Publication Date
2002
Citation Information
Darlene R. Goldstein, Debashis Ghosh and Erin M. Conlon. "Statistical Issues in the Clustering of Gene Expression Data" Statistica Sinica Vol. 12 (2002)
Available at: http://works.bepress.com/erin_conlon/7/