Sandrine Dudoit is Associate Professor of Biostatistics and Statistics at the
University of California, Berkeley. 

Professor Dudoit's research and teaching activities concern the development and
application of statistical and computational methods to address problems in biomedical
and genomic research. 

Specific areas of interest include: 

* the design and analysis of high-throughput gene expression experiments (e.g., cDNA
microarrays, alternative splicing microarrays, ChIP-Chip, metagenomics microarrays); 

* nucleotide and protein sequence analysis (e.g., identification of regulatory motifs in
DNA sequences); 

* the genetic mapping of complex traits (e.g., IBD-based linkage analysis, linkage
disequilibrium analysis, SNP-based association studies, microarray-based genetic mapping
studies of gene expression); 

* the analysis of biological annotation metadata (e.g., Gene Ontology (GO) annotation). 

Her methodological research interests include: 

* loss-based estimation with cross-validation: parametric and non-parametric density
estimation and regression, variable selection; 

* multiple hypothesis testing: resampling-based multiple testing procedures for
controlling generalized Type I error rates, defined as tail probabilities and expected
values for arbitrary functions of the numbers of Type I errors and rejected hypotheses
(e.g., false discovery rate). 

Professor Dudoit is also involved in the development of statistical software for
biomedical and genomic data analysis and is a core member of the Bioconductor Project
(www.bioconductor.org). 

Professor Dudoit obtained a Bachelor's (1992) and Master's (1994) degree in
Mathematics from Carleton University, Ottawa, Canada. She first came to UC Berkeley as a
graduate student and earned a PhD degree in 1999 from the Department of Statistics. Her
doctoral research, under the supervision of Professor Terence P. Speed, concerned the
linkage analysis of complex human traits. From 1999 to 2000, she was a postdoctoral
fellow at the Mathematical Sciences Research Institute, Berkeley. Before joining the
Faculty at UC Berkeley in July 2001, she underwent a year of postdoctoral training in
genomics in the laboratory of Professor Patrick O. Brown, Department of Biochemistry,
Stanford University. Her work in the Brown Lab involved the development of statistical
and computational methods for the design and analysis of gene expression experiments
using DNA microarrays. 

Biological Annotation Metadata Analysis

PDF

Multiple Tests of Association with Biological Annotation Metadata (with Sunduz Keles and Mark J. van der Laan), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)

We propose a general and formal statistical framework for the multiple tests of associations between...

 

Biological Sequence Analysis

PDF

Supervised Detection of Regulatory Motifs in DNA Sequences (with Sunduz Keles, Mark J. van der Laan, Sandrine Dudoit, Biao Xing, and Michael B. Eisen ), Statistical Applications in Genetics and Molecular Biology (2003)
Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary biology....
 

PDF

Supervised Detection of Regulatory Motifs in DNA Sequences (with Sunduz Keles, Mark J. van der Laan, Biao Xing, and Michael B. Eisen), U.C. Berkeley Division of Biostatistics Working Paper Series (2003)
Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary...
 

Genetic Mapping

Link

A Fine-Scale Linkage-Disequilibrium Measure Based on Length of Haplotype Sharing (with Yan Wang and Lue Ping Zhao), The American Journal of Human Genetics (2006)
High-throughput genotyping technologies for SNPs have enabled the recent completion of the International HapMap Project...
 

PDF

A Fine-Scale Linkage Disequilibrium Measure Based on Length of Haplotype Sharing (with Yan Wang and Lue Ping Zhao), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)
High-throughput genotyping technologies for single nucleotide polymorphisms (SNP) have enabled the recent completion of the...
 

PDF

Quantification and Visualization of LD Patterns and Identification of Haplotype Blocks (with Yan Wang), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
Classical measures of linkage disequilibrium (LD) between two loci, based only on the joint distribution...
 

PDF

IBD Configuration Transition Matrices and Linkage Score Tests for Unilineal Relative Pairs, U.C. Berkeley Division of Biostatistics Working Paper Series (2003)
Properties of transition matrices between IBD configurations are derived for four general classes of unilineal...
 

Loss-Based Estimation with Cross-Validation

Link

A deletion/substitution/addition algorithm for classification neural networks, with applications to biomedical data (with Blythe Durbin and Mark J. van der Laan), Journal of Statistical Planning and Inference (2008)
Neural networks are a popular machine learning tool, particularly in applications such as protein structure...
 

Link

Loss-based estimation with evolutionary algorithms and cross-validation (with David Shilane and Richard H. Liang), U.C. Berkeley Division of Biostatistics Working Paper Series (2007)
Many statistical inference methods rely upon selection procedures to estimate a parameter of the joint...
 

Link

Survival Ensembles (with Torsten Hothorn, Peter Buhlmann, Annette M. Molinaro, and Mark J. van der Laan), Biostatistics (2006)
We propose a unified and flexible framework for ensemble learning in the presence of censoring....
 

Link

Oracle inequalities for multi-fold cross validation (with Aad W. van der Vaart and Mark J. van der Laan), Statistics & Decisions (2006)
We consider choosing an estimator or model from a given class by cross validation consisting...
 

Link

The cross-validated adaptive epsilon-net estimator (with Mark J. van der Laan and Aad W. van der Vaart), Statistics & Decisions (2006)
Suppose that we observe a sample of independent and identically distributed realizations of a random...
 

Microarray Data Analysis

Link

Prognosis of stage II colon cancer by non-neoplastic mucosa gene expression profiling (with A. Barrier, F. Roser, P-Y. Boelle, B. Franc, C. Tse, D. Brault, F. Lacaine, S. Houry, P. Callard, C. Penna, B. Debuire, A. Flahault, and A. Lemoine), Oncogene (2007)
We have assessed the possibility to build a prognosis predictor (PP), based on non-neoplastic mucosa...
 

Link

Stage II Colon Cancer Prognosis Prediction by Tumor Gene Expression Profiling (with Alain Barrier, Pierre-Yves Boelle, François Roser, Jennifer Gregg, Chantal Tse, Didier Brault, François Lacaine, Sidney Houry, Michel Huguier, Brigitte Franc, Antoine Flahault, and Antoinette Lemoine), Journal of Clinical Oncology (2006)
PURPOSE: This study mainly aimed to identify and assess the performance of a microarray-based prognosis...
 

Link

Multiple Testing Methods For ChIP–Chip High Density Oligonucleotide Array Data (with Sündüz Keleş, Mark J. van der Laan, and Simon E. Cawley), Journal of Computational Biology (2006)
Cawley et al. (2004) have recently mapped the locations of binding sites for three transcription...
 

Link

Exploration of global gene expression in human liver steatosis by high-density oligonucleotide microarray (with Frank Chiappini, Alain Barrier, Raphaël Saffroy, Marie-Charlotte Domart, Nicolas Dagues, Daniel Azoulay, Mylène Sebagh, Brigitte Franc, Stephan Chevalier, Brigitte Debuire, and Antoinette Lemoine), Laboratory Investigation (2005)
Understanding the molecular mechanisms underlying fatty liver disease (FLD) in humans is of major importance....
 

Link

Gene expression profiling of nonneoplastic mucosa may predict clinical outcome of colon cancer patients (with Alain Barrier, Pierre-Yves Boelle, Antoinette Lemoine, Chantal Tse, Didier Brault, Frank Chiappini, François Lacaine, Sidney Houry, Michel Huguier, and Antoine Flahault), Diseases of the Colon and Rectum (2005)
PURPOSE This study assessed the possibility to build a prognosis predictor, based on microarray gene...
 

Miscellaneous

PDF

A General Framework for Statistical Performance Comparison of Evolutionary Computation Algorithms (with David Shilane, Jarno Martikainen, and Seppo Ovaska), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
This paper proposes a statistical methodology for comparing the performance of evolutionary computation algorithms. A...
 

Multiple Hypothesis Testing

Link

Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: Focus on the false discovery rate and simulation stud (with Houston N. Gilbert and Mark J. van der Laan), U.C. Berkeley Division of Biostatistics Working Paper Series (2007)
This article proposes resampling-based empirical Bayes multiple testing procedures for controlling a broad class of...
 

PDF

A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting (with Daniel Rubin and Mark van der Laan), Statistical Applications in Genetics and Molecular Biology (2006)
Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis...
 

PDF

A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting (with Daniel Rubin, Sandrine Dudoit, and Mark J. van der Laan), U.C. Berkeley Division of Biostatistics Working Paper Series (2006)
Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis...
 

Link

Test statistics null distributions in multiple testing: Simulation studies and applications to genomics (with Katherine S. Pollard, Merrill D. Birkner, and Mark J. van der Laan), Journal de la Société Française de Statistique (2005)

Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying...

 

PDF

Test Statistics Null Distributions in Multiple Testing: Simulation Studies and Applications to Genomics (with Katherine S. Pollard, Merrill D. Birkner, and Mark J. van der Laan), U.C. Berkeley Division of Biostatistics Working Paper Series (2005)

Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying...

 

Statistical Computing

PDF

Multiple Testing Procedures: R multtest Package and Applications to Genomics (with Katherine S. Pollard and Mark J. van der Laan), U.C. Berkeley Division of Biostatistics Working Paper Series (2004)
The Bioconductor R package multtest implements widely applicable resampling-based single-step and stepwise multiple testing procedures...
 

Link

Bioconductor: open software development for computational biology and bioinformatics (with Robert C. Gentleman, Vincent J. Carey, Douglas M. Bates, Ben Bolstad, Marcel Dettling, Byron Ellis, Laurent Gautier, Yongchao Ge, Jeff Gentry, Kurt Hornik, Torsten Hothorn, Wolfgang Huber, Stefano Iacus, Rafael Irizarry, Friedrich Leisch, Cheng Li, Martin Maechler, Anthony J. Rossini, Gunther Sawitzki, Colin Smith, Gordon Smyth, Luke Tierney, Jean Y. H. Yang, and Jianhua Zhang), Genome Biology (2004)
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational...
 

PDF

Bioconductor: Open software development for computational biology and bioinformatics (with Robert C. Gentleman, Vincent J. Carey, Douglas J. Bates, Benjamin M. Bolstad, Marcel Dettling, Byron Ellis, Laurent Gautier, Yongchao Ge, Jeff Gentry, Kurt Hornik, Torsten Hothorn, Wolfgang Huber, Stefano Iacus, Rafael Irizarry, Friedrich Leisch, Cheng Li, Martin Maechler, Anthony J. Rossini, Guenther Sawitzki, Colin Smith, Gordon K. Smyth, Luke Tierney, Yee Hwa Yang, and Jianhua Zhang), Bioconductor Project Working Papers (2004)
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational...