Articles «Previous Next»

Test statistics null distributions in multiple testing: Simulation studies and applications to genomics

Katherine S. Pollard
Merrill D. Birkner
Mark J. van der Laan
Sandrine Dudoit, Division of Biostatistics, School of Public Health, University of California, Berkeley

Abstract

Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying differentially expressed and co-expressed genes in microarray experiments. We have developed generally applicable resamplingbased single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates, defined as tail probabilities and expected values for arbitrary functions of the numbers of false positives and rejected null hypotheses. A key feature of the methodology is the general characterization and explicit construction of a test statistics null distribution (rather than data generating null distribution), which provides Type I error control in testing problems involving general data generating distributions (with arbitrary dependence structures among variables), null hypotheses defined in terms of submodels, and test statistics.

This article presents simulation studies comparing test statistics null distributions in two testing scenarios of great relevance to biomedical and genomic data analysis: tests for regression coefficients in linear models where covariates and error terms are allowed to be dependent and tests for correlation coefficients. The simulation studiesdemonstratethatthechoiceofnulldistributioncanhaveasubstantialimpact on the Type I error properties of a given multiple testing procedure. Procedures based on our proposed non-parametric bootstrap test statistics null distribution typically control the Type I error rate "on target" at the nominal level, while comparable procedures, based on parameter-specific bootstrap data generating null distributions, can be severely anti-conservative or conservative. The analysis of microRNA expression data from cancerous and non-cancerous tissues (Lu et al., 2005), using tests for logistic regression coefficients and correlation coefficients, illustrates the flexibility and power of our proposed methodology.

Suggested Citation

Katherine S. Pollard, Merrill D. Birkner, Mark J. van der Laan, and Sandrine Dudoit. "Test statistics null distributions in multiple testing: Simulation studies and applications to genomics" Journal de la Société Française de Statistique 146.1-2 (2005): 77-115.