Testing for Heterogeneous Treatment Effects in Experimental Data: False Discovery Risks and Correction Procedures
Abstract
Randomization has emerged as preferred empirical strategy for researchers in a variety of fields over the past years. While the advantages of RCTs in terms of identification are obvious, the statistical analysis of experimental data is not without challenges. In this paper we focus on multiple hypothesis testing as one statistical issue commonly encountered in economic research. In many cases, researchers are not only interested in the main treatment effect, but also want to investigate the degree to which the impact of a given treatment varies across specific geographic or socio-demographic groups of interest. In order to test for such heterogeneous treatment effects, researchers generally either use subsample analysis or interaction terms. While both approaches have been widely applied in the empirical literature, they are generally not valid statistically, and, as we demonstrate in this paper, lead to an almost linear increase in the likelihood of false discoveries. We show that the likelihood of finding one out of ten interaction terms statistically significant in standard OLS regressions is 42%, and that two thirds of statistically significant interaction terms using PROGRESA data can be presumed to represent false discoveries. We demonstrate that applying correction procedures developed in the statistics literature can fully address this issue, and discuss the implications of multiple testing adjustments for power calculations and experimental design. While multiple testing corrections do require large sample sizes ex-ante, the adjustments necessary to preserve power when corrections are applied appear relatively small.
Suggested Citation
Günther Fink, Margaret McConnell, and Sebastian Vollmer. 2011. "Testing for Heterogeneous Treatment Effects in Experimental Data: False Discovery Risks and Correction Procedures" Leibniz Universität Hannover, Discussion Paper No. 477.