Our research involves developing statistical methods and theories for the analysis of data as commonly arise in randomized controlled trials and observational studies. In particular, we are concerned with methods dealing in proper ways with informative censoring, confounding, the curse of dimensionality, multiple testing, and data adaptive selection of models. Our phylosophy is targeted learning, formalized by our recent work on targeted maximum likelihood learning, and unified loss based learning. This statistical approach aims to let the data speak for the purpose of answering a particular scientific question of interest, and provide robust tests of null hypotheses of interest. We are continuously concerned with bringing these methods into practice and benchmark them by the practical performance on simulated and real data.
Biological Annotation Metadata Analysis
Multiple Tests of Association with Biological Annotation Metadata (with Sandrine Dudoit and Sunduz Keles), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
We propose a general and formal statistical framework for the multiple tests of associations between...
Biological Sequence Analysis
Supervised Detection of Regulatory Motifs in DNA Sequences (with Sunduz Keles, Sandrine Dudoit, Biao Xing, and Michael B. Eisen ), Statistical Applications in Genetics and Molecular Biology (2006)
Supervised Detection of Regulatory Motifs in DNA Sequences (with Sunduz Keles, Sandrine Dudoit, Biao Xing, and Michael B. Eisen), U.C. Berkeley Division of Biostatistics Working Paper Series (2003)
Categorical Data Analysis
Empirical Bayes and Resampling Based Multiple Testing Procedure Controlling Tail Probability of the Proportion of False Positives. (with Merrill D. Birkner and Alan E. Hubbard), Statistical Applications in Genetics and Molecular Biology (2006)
Choice of Monitoring Mechanism for Optimal Nonparametric Functional Estimation for Binary Data (with Nicholas P. Jewell and Stephen Shiboski), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Choice of Monitoring Mechanism for Optimal Nonparametric Functional Estimation for Binary Data (with Nicholas P. Jewell and Stephen Shiboski), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Estimation of Treatment Effects in Randomized Trials with Noncompliance and a Dichotomous Outcome (with Alan E. Hubbard and Nicholas P. Jewell), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Estimation of Treatment Effects in Randomized Trials with Noncompliance and a Dichotomous Outcome (with Alan E. Hubbard and Nicholas P. Jewell), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Clinical Epidemiology
History-Adjusted Marginal Structural Models and Statically-Optimal Dynamic Treatment Regimens (with Maya L. Petersen and Marshall M. Joffe), The International Journal of Biostatistics (2006)
Clinical Trials
Estimation of Treatment Effects in Randomized Trials with Noncompliance and a Dichotomous Outcome (with Alan E. Hubbard and Nicholas P. Jewell), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Estimation of Treatment Effects in Randomized Trials with Noncompliance and a Dichotomous Outcome (with Alan E. Hubbard and Nicholas P. Jewell), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Estimation of Direct and Indirect Causal Effects in Longitudinal Studies (with Maya L. Petersen), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Comparison of the Inverse Probability of Treatment Weighted (IPTW) Estimator With a Naïve Estimator in the Analysis of Longitudinal Data With Time-Dependent Confounding: A Simulation Study (with Thaddeus Haight, Romain Neugebauer, and Ira B. Tager), U.C. Berkeley Division of Biostatistics Working Paper Series (2003)
Measuring Treatment Effects Using Semiparametric Models (with Zhuo Yu), U.C. Berkeley Division of Biostatistics Working Paper Series (2003)
Computation
Empirical Bayes and Resampling Based Multiple Testing Procedure Controlling Tail Probability of the Proportion of False Positives. (with Merrill D. Birkner and Alan E. Hubbard), Statistical Applications in Genetics and Molecular Biology (2006)
Cluster Analysis of Genomic Data with Applications in R (with Katherine S. Pollard), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Multiple Testing Procedures and Applications to Genomics (with Merrill D. Birkner, Katherine S. Pollard, and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Data Adaptive Estimation of the Treatment Specific Mean (with Yue Wang and Oliver Bembom), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
History-Adjusted Marginal Structural Models and Statically-Optimal Dynamic Treatment Regimes (with Maya L. Petersen), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Computational Biology/Bioinformatics
Supervised Detection of Regulatory Motifs in DNA Sequences (with Sunduz Keles, Sandrine Dudoit, Biao Xing, and Michael B. Eisen ), Statistical Applications in Genetics and Molecular Biology (2006)
Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data. (with Merrill D. Birkner and Sandra E. Sinisi), Statistical Applications in Genetics and Molecular Biology (2006)
Cluster Analysis of Genomic Data with Applications in R (with Katherine S. Pollard), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Statistical Inference for Simultaneous Clustering of Gene Expression Data (with Katherine S. Pollard), U.C. Berkeley Division of Biostatistics Working Paper Series (2001)
Paired and Unpaired Comparisons and Clustering with Gene Expression Data (with Jennifer F. Bryan and Katherine S. Pollard), U.C. Berkeley Division of Biostatistics Working Paper Series (2001)
Design of Experiments and Sample Surveys
Gene Expression Analysis with the Parametric Bootstrap (with Jennifer F. Bryan), U.C. Berkeley Division of Biostatistics Working Paper Series (2000)
Disease Modeling
Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data. (with Merrill D. Birkner and Sandra E. Sinisi), Statistical Applications in Genetics and Molecular Biology (2006)
Colon Cancer Prognosis Prediction by Gene Expression Profiling (with Alain Barrier and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Prognosis of Stage II Colon Cancer by Non-Neoplastic Mucosa Gene Expresssion Profiling (with Alain Barrier and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data (with Merrill D. Birkner and Sandra E. Sinisi), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Comparative Genomic Hybridization Array Analysis (with Annette M. Molinaro and Dan H. Moore), U.C. Berkeley Division of Biostatistics Working Paper Series (2002)
At the present time, there is increasing evidence that cancer may be regulated by the...
Epidemiology
History-Adjusted Marginal Structural Models and Statically-Optimal Dynamic Treatment Regimens (with Maya L. Petersen and Marshall M. Joffe), The International Journal of Biostatistics (2006)
Extending Marginal Structural Models through Local, Penalized, and Additive Learning (with Daniel Rubin), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
Marginal structural models (MSMs) allow one to form causal inferences from data, by specifying a...
History-Adjusted Marginal Structural Models to Estimate Time-Varying Effect Modification (with Maya L. Petersen, Steven G. Deeks, and Jeffrey N. Martin), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Population Intervention Models in Causal Inference (with Alan E. Hubbard), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Estimation of Direct Causal Effects (with Maya L. Petersen), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
General Biostatistics
Statistical Inference for Variable Importance, The International Journal of Biostatistics (2006)
Issues of Processing and Multiple Testing of SELDI-TOF MS Proteomic Data (with Merrill D. Birkner, Alan E. Hubbard, Christine F. Skibola, Christine M. Hegedus, and Martyn T. Smith), Statistical Applications in Genetics and Molecular Biology (2006)
History-Adjusted Marginal Structural Models and Statically-Optimal Dynamic Treatment Regimens (with Maya L. Petersen and Marshall M. Joffe), The International Journal of Biostatistics (2006)
Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives (with Sandrine Dudoit and Katherine S. Pollard), Statistical Applications in Genetics and Molecular Biology (2006)
Multiple Tests of Association with Biological Annotation Metadata (with Sandrine Dudoit and Sunduz Keles), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
We propose a general and formal statistical framework for the multiple tests of associations between...
Genetics
Multiple Tests of Association with Biological Annotation Metadata (with Sandrine Dudoit and Sunduz Keles), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
We propose a general and formal statistical framework for the multiple tests of associations between...
Human Genetics
Application of a Multiple Testing Procedure Controlling the Proportion of False Positives to Protein and Bacterial Data (with Merrill D. Birkner and Alan E. Hubbard), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Prognosis of Stage II Colon Cancer by Non-Neoplastic Mucosa Gene Expresssion Profiling (with Alain Barrier and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Cluster Analysis of Genomic Data with Applications in R (with Katherine S. Pollard), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding (with Sandrine Dudoit, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, and Siew Leng Teng), U.C. Berkeley Division of Biostatistics Working Paper Series (2003)
Tree-based Multivariate Regression and Density Estimation with Right-Censored Data (with Annette M. Molinaro and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2003)
We propose a unified strategy for estimator construction, selection, and performance assessment in the presence...
Laboratory and Basic Science Research
Supervised Detection of Conserved Motifs in DNA Sequences with cosmo (with Oliver Bembom and Sunduz Keles), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
Identification of transcription factor binding sites is a major interest in contemporary biological research. A...
Multiple Tests of Association with Biological Annotation Metadata (with Sandrine Dudoit and Sunduz Keles), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
We propose a general and formal statistical framework for the multiple tests of associations between...
Application of a Multiple Testing Procedure Controlling the Proportion of False Positives to Protein and Bacterial Data (with Merrill D. Birkner and Alan E. Hubbard), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Test Statistics Null Distributions in Multiple Testing: Simulation Studies and Applications to Genomics (with Katherine S. Pollard, Merrill D. Birkner, and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying...
Multiple Testing Procedures: R multtest Package and Applications to Genomics (with Katherine S. Pollard and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Longitudinal Data Analysis and Time Series
History-Adjusted Marginal Structural Models and Statically-Optimal Dynamic Treatment Regimens (with Maya L. Petersen and Marshall M. Joffe), The International Journal of Biostatistics (2006)
Individualized Treatment Rules: Generating Candidate Clinical Trials (with Maya L. Petersen and Steven G. Deeks), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
Direct Effect Models (with Maya L. Petersen), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
G-computation Estimation of Nonparametric Causal Effects on Time-Dependent Mean Outcomes in Longitudinal Studies (with Romain Neugebauer), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Causal Inference in Longitudinal Studies with History-Restricted Marginal Structural Models (with Romain Neugebauer and Ira B. Tager), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Loss-Based Estimation with Cross-Validation
Asymptotic Optimality of Likelihood-Based Cross-Validation (with Sandrine Dudoit and Sunduz Keles), Statistical Applications in Genetics and Molecular Biology (2006)
Survival Ensembles (with Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, and Annette M. Molinaro), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Optimization of the Architecture of Neural Networks Using a Deletion/Substitution/Addition Algorithm (with Blythe Durbin and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
The Cross-Validated Adaptive Epsilon-Net Estimator (with Sandrine Dudoit and Aad W. van der Vaart), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding (with Sandrine Dudoit, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, and Siew Leng Teng), U.C. Berkeley Division of Biostatistics Working Paper Series (2003)
Medical Specialties
Colon Cancer Prognosis Prediction by Gene Expression Profiling (with Alain Barrier and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Prognosis of Stage II Colon Cancer by Non-Neoplastic Mucosa Gene Expresssion Profiling (with Alain Barrier and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data (with Sunduz Keles, Sandrine Dudoit, and Simon E. Cawley), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Microarray Data Analysis
Colon Cancer Prognosis Prediction by Gene Expression Profiling (with Alain Barrier and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Prognosis of Stage II Colon Cancer by Non-Neoplastic Mucosa Gene Expresssion Profiling (with Alain Barrier and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data (with Sunduz Keles, Sandrine Dudoit, and Simon E. Cawley), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Microarrays
Cluster Analysis of Genomic Data with Applications in R (with Katherine S. Pollard), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Regulatory Motif Finding by Logic Regression (with Sunduz Keles and Chris Vulpe), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Multiple transcription factors coordinately control transcriptional regulation of genes in eukaryotes. Although multiple computational methods...
A Statistical Method for Constructing Transcriptional Regulatory Networks Using Gene Expression and Sequence Data (with Biao Xing), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding (with Sandrine Dudoit, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, and Siew Leng Teng), U.C. Berkeley Division of Biostatistics Working Paper Series (2003)
A Method to Identify Significant Clusters in Gene Expression Data (with Katherine S. Pollard), U.C. Berkeley Division of Biostatistics Working Paper Series (2002)
Multiple Hypothesis Testing
Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate (with Sandrine Dudoit and Katherine S. Pollard), Statistical Applications in Genetics and Molecular Biology (2006)
Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates (with Sandrine Dudoit and Katherine S. Pollard), Statistical Applications in Genetics and Molecular Biology (2006)
Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives (with Sandrine Dudoit and Katherine S. Pollard), Statistical Applications in Genetics and Molecular Biology (2006)
A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting (with Daniel Rubin and Sandrine Dudoit), Statistical Applications in Genetics and Molecular Biology (2006)
A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting (with Daniel Rubin and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
Multivariate Analysis
Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data. (with Merrill D. Birkner and Sandra E. Sinisi), Statistical Applications in Genetics and Molecular Biology (2006)
Application of a Variable Importance Measure Method (with Merrill D. Birkner), The International Journal of Biostatistics (2006)
Multiple Tests of Association with Biological Annotation Metadata (with Sandrine Dudoit and Sunduz Keles), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
We propose a general and formal statistical framework for the multiple tests of associations between...
Data Adaptive Pathway Testing (with Merrill D. Birkner and Alan E. Hubbard), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Application of a Variable Importance Measure Method to HIV-1 Sequence Data (with Merrill D. Birkner), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Statistical Computing
Multiple Testing Procedures: R multtest Package and Applications to Genomics (with Katherine S. Pollard and Sandrine Dudoit), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Statistical Models
Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data. (with Merrill D. Birkner and Sandra E. Sinisi), Statistical Applications in Genetics and Molecular Biology (2006)
Deletion/Substitution/Addition Algorithm in Learning with Applications in Genomics (with Sandra E. Sinisi), Statistical Applications in Genetics and Molecular Biology (2006)
Cross-Validated Bagged Prediction of Survival (with Sandra E. Sinisi and Romain Neugebauer), Statistical Applications in Genetics and Molecular Biology (2006)
Super Learning: an Application to Prediction of HIV-1 Drug Susceptibility (with Sandra E. Sinisi and Maya L. Petersen), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
Causal Effect Models for Intention to Treat and Realistic Individualized Treatment Rules, U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
An important class of models in causal inference are the so-called marginal structural models which...
Statistical Theory and Methods
Quantile-Function Based Null Distribution in Resampling Based Multiple Testing (with Alan E. Hubbard), Statistical Applications in Genetics and Molecular Biology (2006)
Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate (with Sandrine Dudoit and Katherine S. Pollard), Statistical Applications in Genetics and Molecular Biology (2006)
Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates (with Sandrine Dudoit and Katherine S. Pollard), Statistical Applications in Genetics and Molecular Biology (2006)
Estimating a Survival Distribution with Current Status Data and High-dimensional Covariates (with Aad van der Vaart), The International Journal of Biostatistics (2006)
Empirical Bayes and Resampling Based Multiple Testing Procedure Controlling Tail Probability of the Proportion of False Positives. (with Merrill D. Birkner and Alan E. Hubbard), Statistical Applications in Genetics and Molecular Biology (2006)
Survival Analysis
Cross-Validated Bagged Prediction of Survival (with Sandra E. Sinisi and Romain Neugebauer), Statistical Applications in Genetics and Molecular Biology (2006)
Choice of Monitoring Mechanism for Optimal Nonparametric Functional Estimation for Binary Data (with Nicholas P. Jewell and Stephen Shiboski), The International Journal of Biostatistics (2006)
Doubly Robust Censoring Unbiased Transformations (with Daniel Rubin), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
Cross-validated Bagged Prediction of Survival (with Sandra E. Sinisi), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Survival Point Estimate Prediction in Matched and Non-Matched Case-Control Subsample Designed Studies (with Annette M. Molinaro, Dan H. Moore, and Karla Kerlikowske), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
Providing information about the risk of disease and clinical factors that may increase or...
No subject area
Locally Efficient Estimation with Bivariate Right Censored Data (with Christopher M. Quale and James M. Robins), U.C. Berkeley Division of Biostatistics Working Paper Series (2001)
Smooth Estimation of a Monotone Density (with Aad W. van der Vaart), U.C. Berkeley Division of Biostatistics Working Paper Series (2001)
Fitting of Mixtures with Unspecified Number of Components Using Cross Validation Distance Estimate (with Maja Miloslavsky), U.C. Berkeley Division of Biostatistics Working Paper Series (2001)
Locally Efficient Estimation in Censored Data Models: Theory and Examples (with Richard D. Gill and James M. Robins), U.C. Berkeley Division of Biostatistics Working Paper Series (2000)
Estimation with Interval Censored Data in Longitudinal Studies, U.C. Berkeley Division of Biostatistics Working Paper Series (1998)
In biostatistical applications interest often focuses on the estimation of the distribution of a time-until-event...