<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Mark J. van der Laan</title>
<copyright>Copyright (c) 2011  All rights reserved.</copyright>
<link>http://works.bepress.com/mark_van_der_laan</link>
<description>Recent documents in Mark J. van der Laan</description>
<language>en-us</language>
<lastBuildDate>Fri, 08 Jul 2011 19:00:55 PDT</lastBuildDate>
<ttl>3600</ttl>








<item>
<title>Long-term consequences of the delay between virologic failure of highly active antiretroviral therapy and regimen modification</title>
<link>http://works.bepress.com/mark_van_der_laan/293</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/293</guid>
<pubDate>Sat, 26 Mar 2011 21:22:58 PDT</pubDate>
<description>
	<![CDATA[
	<p>Objectives: Current treatment guidelines recommend immediate modification of antiretroviral therapy in HIV-infected individuals with incomplete viral suppression. These recommendations have not been tested in observational studies or large randomized trials. We evaluated the consequences of delayed modification following virologic failure. Design/methods: We used prospective data from two clinical cohorts to estimate the effect of time until regimen modification following first regimen failure on all-cause mortality. The impact of regimen type was also assessed. As the effect of delayed switching can be confounded if patients with a poor prognosis modify therapy earlier than those with a good prognosis, we used a statistical methodology – marginal structural models – to control for time-dependent confounding. Results: A total of 982 patients contributed 3414 person-years of follow-up following first regimen failure. Delay until treatment modification was associated with an elevated hazard of all-cause mortality among patients failing a reverse transcriptase inhibitor-based regimen (hazard ratio per additional 3 months delay: 1.23, 95% confidence interval: 1.08, 1.40), but appeared to have a small protective effect among patients failing a protease inhibitor-based regimen (hazard ratio per additional 3 months delay: 0.93, 95% confidence interval: 0.87, 0.99). Conclusion: Delay in modification after failure of regimens that do not contain a protease inhibitor is associated with increased mortality. Protease inhibitor-based regimens are less dependent on early versus delayed switching strategies. Efforts should be made to minimize delay until treatment modification in resource-poor regions, where the majority of patients are starting reverse transcriptase inhibitor-based regimens and HIV RNA monitoring may not be available.</p>

	]]>
</description>

<author>Maya L. Petersen et al.</author>


<category>Clinical Epidemiology</category>

<category>Longitudinal Data Analysis and Time Series</category>

<category>HIV</category>

</item>






<item>
<title>Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery</title>
<link>http://works.bepress.com/mark_van_der_laan/292</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/292</guid>
<pubDate>Sat, 26 Mar 2011 21:22:54 PDT</pubDate>
<description>
	<![CDATA[
	<p>In longitudinal and repeated measures data analysis, often the goal is to determine the effect of a treatment or aspect on a particular outcome (e.g., disease progression). We consider a semiparametric repeated measures regression model, where the parametric component models effect of the variable of interest and any modification by other covariates. The expectation of this parametric component over the other covariates is a measure of variable importance. Here, we present a targeted maximum likelihood estimator of the finite dimensional regression parameter, which is easily estimated using standard software for generalized estimating equations.</p>
<p>The targeted maximum likelihood method provides double robust and locally efficient estimates of the variable importance parameters and inference based on the influence curve.  We demonstrate these properties through simulation under correct and incorrect model specification, and apply our method in practice to estimating the activity of transcription factor (TF) over cell cycle in yeast.  We specifically target the importance of SWI4, SWI6, MBP1, MCM1, ACE2, FKH2, NDD1, and SWI5.</p>
<p>The semiparametric model allows us to determine the importance of a TF at specific time points by specifying time indicators as potential effect modifiers of the TF. Our results are promising, showing significant importance trends during the expected time periods. This methodology can also be used as a variable importance analysis tool to assess the effect of a large number of variables such as gene expressions or single nucleotide polymorphisms.</p>

	]]>
</description>

<author>Catherine Tuglus et al.</author>


<category>Computational Biology/Bioinformatics</category>

<category>Longitudinal Data Analysis and Time Series</category>

<category>Statistical Theory and Methods</category>

<category>Biology &amp; Genetics</category>

</item>






<item>
<title>The cross-validated adaptive epsilon-net estimator</title>
<link>http://works.bepress.com/mark_van_der_laan/291</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/291</guid>
<pubDate>Sat, 26 Mar 2011 21:22:50 PDT</pubDate>
<description>
	<![CDATA[
	<p>Suppose that we observe a sample of independent and identically distributed realizations of a random variable, and a parameter of interest can be defined as the minimizer, over a suitably defined parameter set, of the expectation of a (loss) function of a candidate parameter value and the random variable. For example, squared error loss in regression or the negative log-density loss in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter set may result in ill-defined or too variable estimators of the parameter of interest. In this article, we propose a cross-validated ε-net estimation method, which uses a collection of submodels and a collection of ε-nets over each submodel. For each submodel s and each resolution level ε, the minimizer of the empirical risk over the corresponding ε-net is a candidate estimator. Next we select from these estimators (i.e. select the pair (s,ε)) by multi-fold cross-validation. We derive a finite sample inequality that shows that the resulting estimator is as good as an oracle estimator that uses the best submodel and resolution level for the unknown true parameter. We also address the implementation of the estimation procedure, and in the context of a linear regression model we present results of a preliminary simulation study comparing the cross-validated ε-net estimator to the cross-validated L1-penalized least squares estimator (LASSO) and the least angle regression estimator (LARS).</p>

	]]>
</description>

<author>Mark J. van der Laan et al.</author>


<category>Loss-Based Estimation with Cross-Validation</category>

</item>






<item>
<title>A Note on Targeted Maximum Likelihood and Right Censored Data</title>
<link>http://works.bepress.com/mark_van_der_laan/290</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/290</guid>
<pubDate>Sat, 26 Mar 2011 21:22:46 PDT</pubDate>
<description>
	<![CDATA[
	<p>A popular way to estimate an unknown parameter is with substitution, or evaluating the parameter at a likelihood based fit of the data generating density.  In many cases, such estimators have substantial bias and can fail to converge at the parametric rate.  van der Laan and Rubin (2006) introduced targeted maximum likelihood learning, removing these shackles from substitution estimators, which were made in full agreement with the locally efficient estimating equation procedures as presented in Robins and Rotnitzsky (1992) and van der Laan and Robins (2003).  This note illustrates how targeted maximum likelihood can be applied in right censored data structures.  In particular, we show that when an initial substitution estimator is based on a Cox proportional hazards model, the targeted likelihood algorithm can be implemented by iteratively adding an appropriate time-dependent covariate.</p>

	]]>
</description>

<author>Mark J. van der Laan et al.</author>


<category>Causal Inference</category>

<category>Statistical Theory and Methods</category>

<category>Survival Analysis</category>

</item>






<item>
<title>Statistics Ready for a Revolution: Next Generation of Statisticians Must Build Tools for Massive Data Sets</title>
<link>http://works.bepress.com/mark_van_der_laan/289</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/289</guid>
<pubDate>Sat, 26 Mar 2011 21:22:42 PDT</pubDate>
<description>
	<![CDATA[
	<p>The statistics profession has reached a tipping point. The need for valid statistical tools is greater than ever; data sets are massive, often measuring hundreds of thousands of measurements for a single subject. The field is ready for a revolution, one driven by clear, objective benchmarks by which tools can be evaluated.</p>

	]]>
</description>

<author>Mark J. van der Laan et al.</author>


<category>Media Publications</category>

</item>






<item>
<title>Collaborative Targeted Maximum Likelihood For Time To Event Data</title>
<link>http://works.bepress.com/mark_van_der_laan/288</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/288</guid>
<pubDate>Sat, 26 Mar 2011 21:22:39 PDT</pubDate>
<description>
	<![CDATA[
	<p>Current methods used to analyze time to event data either, rely on highly parametric assumptions which result in biased estimates of parameters which are purely chosen out of convenience, or are highly unstable because they ignore the global constraints of the true model. By using Targeted Maximum Likelihood Estimation one may consistently estimate parameters which directly answer the statistical question of interest. Targeted Maximum Likelihood Estimators are substitution estimators, which rely on estimating the underlying distribution. However, unlike other substitution estimators, the underlying distribution is estimated specifically to reduce bias in the estimate of the parameter of interest. We will present here an extension of Targeted Maximum Likelihood Estimation for observational time to event data, the Collaborative Targeted Maximum Likelihood Estimator for the treatment specific survival curve. Through the use of a simulation study we will show that this method improves on commonly used methods in both robustness and efficiency. In fact, we will show that in certain situations the C-TMLE produces estimates whose mean square error is lower than the semi-parametric efficiency bound. Lastly, we will show that the bootstrap is able to produce valid 95 percent confidence intervals in sparse data situations, while influence curve based inference breaks down.</p>

	]]>
</description>

<author>Ori M. Stitelman et al.</author>


<category>Causal Inference</category>

<category>Survival Analysis</category>

</item>






<item>
<title>Targeting The Optimal Design In Randomized Clinical Trials With Binary Outcomes And No Covariate</title>
<link>http://works.bepress.com/mark_van_der_laan/287</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/287</guid>
<pubDate>Sat, 26 Mar 2011 21:22:36 PDT</pubDate>
<description>
	<![CDATA[
	<p>This article is devoted to the asymptotic study of adaptive group sequential  designs in the case  of randomized clinical trials with binary treatment, binary outcome and no  covariate. By adaptive design, we mean in this setting a clinical trial  design  that allows  the investigator to dynamically modify its  course through  data-driven adjustment of the randomization probability based on data accrued so far,  without negatively impacting on  the statistical integrity of the trial. By adaptive group sequential design, we refer to the fact that group sequential testing methods can be equally  well applied on top of adaptive  designs.  Prior to collection of  the data,  the  trial  protocol specifies  the  parameter of scientific interest.  In the estimation framework, the trial protocol also a priori specifies the  confidence  level to be used in constructing frequentist confidence intervals for  the latter parameter and  the  related inferential method,  which will  be based  on the maximum likelihood  principle.  In the testing framework,  the trial  protocol also a  priori specifies the  null and alternative hypotheses  regarding the latter parameter, the  wished type I and type II errors,  the rule for determining the  maximal statistical information to be accrued, and the frequentist testing procedure, including conditions for early  stopping.  Furthermore, we assume that the protocol specifies a user-supplied optimal unknown choice of randomization scheme, and we will focus on that randomization scheme which minimizes the asymptotic variance of the maximum likelihood estimator of the parameter of interest.</p>
<p>We obtain that, theoretically, the adaptive design converges almost surely to the targeted unknown randomization  scheme. In the estimation framework, we obtain that our maximum likelihood estimator of the parameter of interest is a strongly consistent estimator, and it satisfies a central  limit theorem.  We can estimate its asymptotic variance, which is the same  as that  it would feature had we known in advance the  targeted  randomization  scheme  and independently sampled from it.  Consequently, inference can be carried out as if we had resorted to independent and identically distributed (iid) sampling. In the testing framework, we obtain that the multidimensional t-statistics that we would use under iid sampling still converges to the same canonical distribution under adaptive sampling.  Consequently, the same group sequential testing can be carried  out as if we had resorted to iid sampling. Furthermore, a comprehensive simulation study that we undertake validates the theory.  It notably shows in the estimation framework that the confidence intervals we obtain achieve the desired coverage even for moderate sample sizes.  In addition, it shows in the testing framework that type I error control at the prescribed level is guaranteed, and that all sampling procedures only suffer from a very slight increase of the type II error.</p>
<p>A three-sentence take-home message is: "Adaptive designs do learn the targeted optimal design and inference and testing can be carried out under adaptive sampling as they would under the targeted optimal randomization probability iid sampling. In particular, adaptive designs achieve the same efficiency as the fixed oracle design. This is confirmed by a simulation study, at least for moderate or large sample sizes, across a large collection of targeted randomization probabilities."</p>

	]]>
</description>

<author>Antoine Chambaz et al.</author>


<category>Clinical Trials</category>

</item>






<item>
<title>Analyzing Direct Effects in Randomized Trials with Secondary Interventions: An Application to HIV Prevention Trials</title>
<link>http://works.bepress.com/mark_van_der_laan/286</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/286</guid>
<pubDate>Sat, 26 Mar 2011 21:22:32 PDT</pubDate>
<description>
	<![CDATA[
	<p>The `Methods for improving reproductive health in Africa' trial is a recently completed randomized trial that investigated the effect of diaphragm and lubricant gel use in reducing infection by the human immunodeficiency virus (HIV) among susceptible women. 5045 women were randomly assigned to either the active treatment arm or not. Additionally, all subjects in both arms received intensive condom counselling and provision, the `gold standard' HIV prevention barrier method. There was much lower reported use of condoms in the intervention arm than in the control arm, making it difficult to answer important public health questions based solely on the intention-to-treat analysis. We adapt an analysis technique from causal inference to estimate the `direct effects' of assignment to the diaphragm arm, adjusting for use of condoms in an appropriate sense. Issues raised in the trial apply to other trials of HIV prevention methods, some of which are currently being conducted or designed.</p>

	]]>
</description>

<author>Michael Rosenblum et al.</author>


<category>Clinical Epidemiology</category>

<category>Clinical Trials</category>

<category>HIV</category>

</item>






<item>
<title>Virologic Efficacy of Boosted Double vs. Boosted Single Protease Inhibitor Therapy.</title>
<link>http://works.bepress.com/mark_van_der_laan/285</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/285</guid>
<pubDate>Sat, 26 Mar 2011 21:22:27 PDT</pubDate>
<description>
	<![CDATA[
	<p>Objective: Although regimens containing two protease inhibitor (PI) together with ritonavir boosting are used with the aim of improving virologic response to salvage therapy, there is little evidence to support or reject this approach. We compared the probability of attaining an undetectable HIV RNA level after using either boosted double or boosted single PI regimens. Design: Retrospective clinical cohort. Methods: PI-experienced subjects in a Northern California-based database who initiated either a boosted single or boosted double PI salvage therapy regimen were analysed. Traditional multivariable regression and marginal structural model analyses were used to compare the effects of the two regimens on virologic suppression 12–36 weeks after initiation of salvage therapy, controlling for confounding by baseline HIV RNA level, CD4 lymphocyte count, treatment history, drug resistance, and multiple characteristics of the salvage regimen. Results: Fifty-one percent of boosted single PI regimens (n=805) and 51.6% of boosted double PI regimens (n=183) achieved a plasma HIV RNA level of <75>copies/ml at week 12–36. In models including multiple potentially confounding variables, estimates of the relative odds of suppression on boosted double versus boosted single PI regimens ranged from 1.17 (95% CI, 0.54–2.55) to 1.33 (95% CI, 0.82–2.14). Conclusions: We were not able to reject the null hypothesis that boosted double versus boosted single PI regimens, resulted in equivalent probabilities of virologic success.</p>

	]]>
</description>

<author>Maya Petersen et al.</author>


<category>Clinical Epidemiology</category>

<category>HIV</category>

</item>






<item>
<title>Biomarker discovery using targeted maximum likelihood estimation: Application to the treatment of antiretroviral resistant HIV infection</title>
<link>http://works.bepress.com/mark_van_der_laan/284</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/284</guid>
<pubDate>Sat, 26 Mar 2011 21:22:24 PDT</pubDate>
<description>
	<![CDATA[
	<p>Researchers in clinical science and bioinformatics frequently aim to learn which of a set of candidate biomarkers is important in determining a given outcome, and to rank the contributions of the candidates accordingly. This article introduces a new approach to research questions of this type, based on targeted maximum likelihood estimation of variable importance measures.</p>
<p>The methodology is illustrated using an example drawn from the treatment of HIV infection.  Specifically, given a list of candidate mutations in the protease enzyme of HIV, we aim to discover mutations that reduce clinical virologic response to antiretroviral regimens containing the protease inhibitor lopinavir.  In the context of this data example, the article reviews the motivation for covariate adjustment in the biomarker discovery process. A standard maximum likelihood approach to this adjustment is compared with the targeted approach introduced here. Implementation of targeted maximum likelihood estimation in the context of biomarker discovery is discussed, and the advantages of this approach are highlighted. Results of applying targeted maximum likelihood estimation to identify lopinavir resistance mutations are presented and compared with results based on unadjusted mutation-outcome associations as well as results of a standard maximum likelihood approach to adjustment.</p>
<p>The subset of mutations identified by targeted maximum likelihood as significant contributors to lopinavir resistance is found to be in better agreement with current understanding of HIV antiretroviral resistance than the corresponding subsets identified by the other two approaches. This finding suggests that targeted estimation of variable importance represents a promising approach to biomarker discovery.</p>

	]]>
</description>

<author>Oliver Bembom et al.</author>


<category>Causal Inference</category>

</item>






<item>
<title>ccosmo: A stand-along C program for the supervised detection of conserved motifs in DNA sequences.</title>
<link>http://works.bepress.com/mark_van_der_laan/283</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/283</guid>
<pubDate>Sat, 26 Mar 2011 21:22:20 PDT</pubDate>
<description>
	<![CDATA[
	<p>cosmo searches a set of unaligned DNA sequences for a shared motif that may, for example, represent a common transcription factor binding site. The algorithm is similar to MEME, but also allows the user to specify a set of constraints that the position weight matrix of the unknown motif must satisfy. Such constraints may include bounds on the information content across certain regions of the unknown motif, for example, and can often be formulated on the basis of prior knowledge about the structure of the transcription factor in question. The unknown motif width, the distribution of motif occurrences (OOPS, ZOOPS, or TCM), as well as the appropriate constraint set can be selected data-adaptively.</p>

	]]>
</description>

<author>Oliver Bembom et al.</author>


<category>Software</category>

</item>






<item>
<title>Biomarker discovery using targeted maximum-likelihood estimation: Application to the treatment of antiretroviral-resistant HIV infection</title>
<link>http://works.bepress.com/mark_van_der_laan/282</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/282</guid>
<pubDate>Sat, 26 Mar 2011 21:22:17 PDT</pubDate>
<description>
	<![CDATA[
	<p>Researchers in clinical science and bioinformatics frequently aim to learn which of a set of candidate biomarkers is important in determining a given outcome, and to rank the contributions of the candidates accordingly. This article introduces a new approach to research questions of this type, based on targeted maximum-likelihood estimation of variable importance measures. The methodology is illustrated using an example drawn from the treatment of HIV infection. Specifically, given a list of candidate mutations in the protease enzyme of HIV, we aim to discover mutations that reduce clinical virologic response to antiretroviral regimens containing the protease inhibitor lopinavir. In the context of this data example, the article reviews the motivation for covariate adjustment in the biomarker discovery process. A standard maximum-likelihood approach to this adjustment is compared with the targeted approach introduced here. Implementation of targeted maximum-likelihood estimation in the context of biomarker discovery is discussed, and the advantages of this approach are highlighted. Results of applying targeted maximum-likelihood estimation to identify lopinavir resistance mutations are presented and compared with results based on unadjusted mutation–outcome associations as well as results of a standard maximum-likelihood approach to adjustment. The subset of mutations identified by targeted maximum likelihood as significant contributors to lopinavir resistance is found to be in better agreement with the current understanding of HIV antiretroviral resistance than the corresponding subsets identified by the other two approaches. This finding suggests that targeted estimation of variable importance represents a promising approach to biomarker discovery.</p>

	]]>
</description>

<author>Oliver Bembom et al.</author>


<category>Longitudinal Data Analysis and Time Series</category>

<category>HIV</category>

<category>Statistical Theory and Methods</category>

</item>






<item>
<title>Targeted Maximum Likelihood Estimation: A Gentle Introduction</title>
<link>http://works.bepress.com/mark_van_der_laan/281</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/281</guid>
<pubDate>Sat, 26 Mar 2011 21:22:13 PDT</pubDate>
<description>
	<![CDATA[
	<p>This paper provides a concise introduction to targeted maximum likelihood estimation (TMLE) of causal effect parameters.   The interested analyst should gain sufficient understanding of TMLE from this introductory tutorial to be able to apply the method in practice.  A program  written in R is provided.  This program implements a basic version of TMLE that can be used to estimate the effect of a binary point treatment on a continuous or binary outcome.</p>

	]]>
</description>

<author>Susan Gruber et al.</author>


<category>Statistical Theory and Methods</category>

</item>






<item>
<title>Covariate Adjustment for the Intention-to-Treat Parameter with Empirical Efficiency Maximization</title>
<link>http://works.bepress.com/mark_van_der_laan/280</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/280</guid>
<pubDate>Sat, 26 Mar 2011 21:22:09 PDT</pubDate>
<description>
	<![CDATA[
	<p>In randomized experiments, the intention-to-treat parameter is defined as the difference in expected outcomes between groups assigned to treatment and control arms.  There is a large literature focusing on how (possibly misspecified) working models can sometimes exploit baseline covariate measurements to gain precision, although covariate adjustment is not strictly necessary.  In Rubin and van der Laan (2008), we proposed the technique of empirical efficiency maximization for improving estimation by forming nonstandard fits of such working models.  Considering a more realistic randomization scheme than in our original article, we suggest a new class of working models for utilizing covariate information, show our method can be implemented by adding weights to standard regression algorithms, and demonstrate benefits over existing estimators through numerical asymptotic efficiency calculations and simulations.</p>

	]]>
</description>

<author>Daniel B. Rubin et al.</author>


<category>Clinical Trials</category>

<category>Statistical Theory and Methods</category>

</item>






<item>
<title>Supervised Distance Matrices: Theory and Applications to Genomics</title>
<link>http://works.bepress.com/mark_van_der_laan/279</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/279</guid>
<pubDate>Sat, 26 Mar 2011 21:22:05 PDT</pubDate>
<description>
	<![CDATA[
	<p>We propose a new approach to studying the relationship between a very high dimensional random variable and an outcome. Our method is based on a novel concept, the supervised distance matrix, which quantifies pairwise similarity between variables based on their association with the outcome. A supervised distance matrix is derived in two stages. The first stage involves a transformation based on a particular model for association. In particular, one might regress the outcome on each variable and then use the residuals or the influence curve from each regression as a data transformation. In the second stage, a choice of distance measure is used to compute all pairwise distances between variables in this transformed data. When the outcome is right-censored, we show that the supervised distance matrix can be consistently estimated using inverse probability of censoring weighted (IPCW) estimators based on the mean and covariance of the transformed data. The proposed methodology is illustrated with examples of gene expression data analysis with a survival outcome. This approach is widely applicable in genomics and other fields where high-dimensional data is collected on each subject.</p>

	]]>
</description>

<author>Katherine S. POLLARD et al.</author>


<category>Statistical Theory and Methods</category>

</item>






<item>
<title>Confidence Intervals for the Population Mean Tailored to Small Sample Sizes, with Applications to Survey Sampling</title>
<link>http://works.bepress.com/mark_van_der_laan/278</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/278</guid>
<pubDate>Sat, 26 Mar 2011 21:22:02 PDT</pubDate>
<description>
	<![CDATA[
	<p>The validity of standard confidence intervals constructed in survey sampling is based on the central limit theorem. For small sample sizes, the central limit theorem may give a poor approximation, resulting in confidence intervals that are misleading. We discuss this issue and propose methods for constructing confidence intervals for the population mean tailored to small sample sizes.</p>
<p>We present a simple approach for constructing confidence intervals for the population mean based on tail bounds for the sample mean that are correct for all sample sizes. Bernstein's inequality provides one such tail bound. The resulting confidence intervals have guaranteed coverage probability under much weaker assumptions than are required for standard methods. A drawback of this approach, as we show, is that these confidence intervals are often quite wide. In response to this, we present a method for constructing much narrower confidence intervals, which are better suited for practical applications, and that are still more robust than confidence intervals based on standard methods, when dealing with small sample sizes. We show how to extend our approaches to much more general estimation problems than estimating the sample mean. We describe how these methods can be used to obtain more reliable confidence intervals in survey sampling. As a concrete example, we construct confidence intervals using our methods for the number of violent deaths between March 2003 and July 2006 in Iraq, based on data from the study ``Mortality after the 2003 invasion of Iraq: A cross sectional cluster sample survey,'' by Burnham et al. (2006).</p>

	]]>
</description>

<author>Michael A. Rosenblum et al.</author>


<category>Statistical Theory and Methods</category>

</item>






<item>
<title>Oracle inequalities for multi-fold cross validation</title>
<link>http://works.bepress.com/mark_van_der_laan/277</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/277</guid>
<pubDate>Sat, 26 Mar 2011 21:21:57 PDT</pubDate>
<description>
	<![CDATA[
	<p>We consider choosing an estimator or model from a given class by cross validation consisting of holding a nonneglible fraction of the observations out as a test set. We derive bounds that show that the risk of the resulting procedure is (up to a constant) smaller than the risk of an oracle plus an error which typically grows logarithmically with the number of estimators in the class. We extend the results to penalized cross validation in order to control unbounded loss functions. Applications include regression with squared and absolute deviation loss and classification under Tsybakov's condition.</p>

	]]>
</description>

<author>Aad W. van der Vaart et al.</author>


<category>Loss-Based Estimation with Cross-Validation</category>

</item>






<item>
<title>Resampling-Based Empirical Bayes Multiple Testing Procedures for Controlling Generalized Tail Probability and Expected Value Error Rates: </title>
<link>http://works.bepress.com/mark_van_der_laan/276</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/276</guid>
<pubDate>Sat, 26 Mar 2011 21:21:53 PDT</pubDate>
<description>
	<![CDATA[
	<p>This article proposes resampling-based empirical Bayes multiple testing procedures for controlling a broad class of Type I error rates, defined as  generalized tail probability (gTP) error rates, <em>gTP</em>(<em>q</em>,<em>g</em>) = Pr(<em>g</em>(<em>V<sub>n</sub></em>,<em>S<sub>n</sub></em>) > <em>q</em>), and generalized expected value (gEV) error rates, <em>gEV</em>(<em>g</em>) = [<em>g</em>(<em>V<sub>n</sub></em>,<em>S<sub>n</sub></em>)],  for arbitrary functions <em>g</em>(<em>V<sub>n</sub></em>,<em>S<sub>n</sub></em>) of the numbers of false positives <em>V<sub>n</sub></em> and true positives <em>S<sub>n</sub></em>.  Of particular interest are error rates based on the proportion <em>g</em>(<em>V<sub>n</sub></em>,<em>S<sub>n</sub></em>) = <em>V<sub>n</sub></em>/(<em>V<sub>n</sub></em> + <em>S<sub>n</sub></em>) of Type I errors among the rejected hypotheses, such as the false discovery rate (FDR), <em>FDR</em> = [<em>V<sub>n</sub></em>/(<em>V<sub>n</sub></em> + <em>S<sub>n</sub></em>)].  The proposed procedures offer several advantages over existing methods.  They provide Type I error control for general data generating distributions, with arbitrary dependence structures among variables. Gains in power are achieved by deriving rejection regions based on  guessed sets of true null hypotheses and null test statistics randomly sampled from joint distributions that account for the dependence structure of the data.  The Type I error and power properties of an FDR-controlling version of the resampling-based empirical Bayes approach are investigated and compared to those of widely-used FDR-controlling linear step-up procedures in a simulation study.  The Type I error and power trade-off achieved by the empirical Bayes procedures under a variety of testing scenarios allows this approach to be competitive with or outperform the Storey and Tibshirani [2003] linear step-up procedure, as an alternative to the classical Benjamini and Hochberg [1995] procedure.</p>

	]]>
</description>

<author>Sandrine Dudoit et al.</author>


<category>Statistical Theory and Methods</category>

</item>






<item>
<title>Estimation of direct causal effects.</title>
<link>http://works.bepress.com/mark_van_der_laan/275</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/275</guid>
<pubDate>Sat, 26 Mar 2011 21:21:50 PDT</pubDate>
<description>
	<![CDATA[
	<p>Many common problems in epidemiologic and clinical research involve estimating the effect of an exposure on an outcome while blocking the exposure's effect on an intermediate variable. Effects of this kind are termed direct effects. Estimation of direct effects arises frequently in research aimed at understanding mechanistic pathways by which an exposure acts to cause or prevent disease, as well as in many other settings. Although multivariable regression is commonly used to estimate direct effects, this approach requires assumptions beyond those required for the estimation of total causal effects. In addition, when the exposure and intermediate interact to cause disease, multivariable regression estimates a particular type of direct effect, the effect of an exposure on outcome fixing the intermediate at a specified level. Using the counterfactual framework, we distinguish this definition of a direct effect (controlled direct effect) from an alternative definition, in which the effect of the exposure on the intermediate is blocked, but the intermediate is otherwise allowed to vary as it would in the absence of exposure (natural direct effect). Relying on examples, we illustrate the difference between controlled and natural direct effects. We present an estimation approach for natural direct effects that can be implemented using standard statistical software and review the assumptions underlying our approach, which are less restrictive than those proposed by previous authors.</p>

	]]>
</description>

<author>Maya L. Petersen et al.</author>


<category>Epidemiology</category>

<category>Clinical Epidemiology</category>

</item>






<item>
<title>Data-adaptive Selection Of The Adjustment Set In Variable Importance  Estimation</title>
<link>http://works.bepress.com/mark_van_der_laan/274</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/274</guid>
<pubDate>Sat, 26 Mar 2011 21:21:46 PDT</pubDate>
<description>
	<![CDATA[
	<p>If estimates of the effect of a treatment variable on an outcome of interest are to be adjusted for a set of possible confounding factors, it is necessary to rely on the assumption of experimental treatment assignment (ETA) according to which each experimental unit has positive probability of being observed at any of the possible levels of the treatment variable regardless of the values the confounding factors may take on. Even if this assumption is only practically violated in the sense that certain values of the confounding factors cause some treatment levels to become not impossible, but at least highly unlikely, the adjusted variable importance parameter often becomes poorly identified in finite samples.</p>
<p>We introduce an algorithm that is intended to make variable importance estimation more robust with respect to violations of the ETA assumption. Two different identifiability criteria are proposed for deciding when an adjusted variable importance parameter cannot be reliably estimated from the data. These criteria are then used to identify a maximal set of adjustment variables for which the ETA assumption appears reasonably well satisfied. A more efficient estimator of the parameter corresponding to the selected adjustment set is then sought by selecting from among estimators making use of even smaller adjustment sets by minimizing an estimate of the mean squared error for the selected parameter.</p>
<p>A simulation study aimed at evaluating the benefits of this latter step suggests that it can lead to efficiency gains on the order of 100% if the ETA assumption is violated to some extent and to efficiency gains on the order of 35% if the ETA assumption is well approximated. The proposed algorithm is applied to the problem of identifying mutations in the protease enzyme of HIV that have an effect on virologic response to the commonly used antiretroviral drug lopinavir. While both unadjusted and fully adjusted analyses yield unsatisfactory results, the subset of significant mutations identified by the algorithm introduced here includes eight of the 12 known major lopinavir resistance mutations as well as two mutations that are thought to increase susceptibility to lopinavir. Two of the four major mutations not identified in our analysis occurred very rarely in our data set, giving the algorithm low power to detect any impact on virologic response. Recent in vitro experiments suggest that the other two major mutations not identified here may in fact be less important in determining lopinavir resistance than previously thought. The excellent agreement of the results reported here with current understanding of lopinavir resistance suggest that variable importance estimation based on data-adaptive selection of the adjustment set represents a promising new approach for studying the effects of HIV mutations on clinical virologic response to antiretroviral therapy as well as for biomarker discovery in general.</p>

	]]>
</description>

<author>Oliver Bembom et al.</author>


<category>Statistical Theory and Methods</category>

</item>





</channel>
</rss>

