Test Statistics Null Distributions in Multiple Testing: Simulation Studies and Applications to Genomics
Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying differentially expressed or co-expressed genes in microarray experiments. We have developed generally applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for control of a broad class of Type I error rates, defined as tail probabilities and expected values for arbitrary functions of the numbers of false positives and rejected hypotheses (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; Pollard and van der Laan, 2004; van der Laan et al., 2005, 2004a,b). As argued in the early article of Pollard and van der Laan (2004), a key feature of the methodology is the general characterization and explicit construction of a test statistics null distribution (rather than data generating null distribution), which provides Type I error control in testing problems involving general data generating distributions (with arbitrary dependence structures among variables), null hypotheses, and test statistics. In particular, the proposed null distribution provides Type I error control without requirements such as subset pivotality (Westfall and Young, 1993) and, therefore, allows one to test hypotheses about a much broader class of parameters than are covered by currently available methods (e.g., correlation coefficients, regression parameters in linear and non-linear models with dependent covariates and error terms).
This paper presents simulation studies comparing test statistics null distributions in two testing scenarios of great relevance to biomedical and genomic data analysis: tests for regression parameters in linear models where covariates and error terms are allowed to be dependent and tests for correlation coefficients. The simulation studies demonstrate that the choice of null distribution can have a substantial impact on the Type I error and power properties of a given multiple testing procedure. Procedures based on a general non-parametric bootstrap estimator of the proposed test statistics null distribution typically control the Type I error rate "on target" at the nominal level. In contrast, comparable procedures, based on parameter-specific bootstrap null distributions, can be severely anti-conservative (bootstrapping residuals for the test of regression parameters) or conservative (independent bootstrap for the test of correlation coefficients). Applications to a novel genomic dataset, from a study of microRNA expression in cancer, illustrate the flexibility and power of our proposed methodology (Lu et al., 2005).
Katherine S. Pollard, Merrill D. Birkner, Mark J. van der Laan, and Sandrine Dudoit. "Test Statistics Null Distributions in Multiple Testing: Simulation Studies and Applications to Genomics" 2005
Available at: http://works.bepress.com/mark_van_der_laan/170