<?xml version="1.0" encoding="iso-8859-1" ?>
<rss version="2.0">
<channel>
<title>Mark J. van der Laan</title>
<copyright>Copyright (c) 2009  All rights reserved.</copyright>
<link>http://works.bepress.com/mark_van_der_laan</link>
<description>Recent documents in Mark J. van der Laan</description>
<language>en-us</language>
<lastBuildDate>Sun, 31 May 2009 09:01:17 PDT</lastBuildDate>
<ttl>3600</ttl>





<item>
<title>Why Prefer Double Robust Estimates? Illustration with Causal Point Treatment Studies</title>
<link>http://works.bepress.com/mark_van_der_laan/181</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/181</guid>
<pubDate>Thu, 16 Nov 2006 02:39:37 PST</pubDate>
<description>In point treatment marginal structural models with treatment A, outcome Y and covariates W, causal parameters can be estimated under the assumption of no unobserved confounders. Three estimates can be used: the G-computation, Inverse Probability of Treatment Weighted (IPTW) or Double Robust (DR) estimates. The properties of the IPTW and DR estimates are known under  an assumption on the treatment mechanism that we name &quot;Experimental Treatment Assignment&quot; (ETA) assumption. We show that the DR estimating function is unbiased when the ETA assumption is violated if the model used to regress Y on A and W is correctly specified. The practical consequence is that  the IPTW estimate is biased at finite sample size when the ETA assumption is approximately or theoretically violated, whereas the finite sample bias for the DR estimate is negligible if the model used to regress Y on A and W is correctly specified.  This result also implies that estimating point treatment causal parameters using a DR estimating function is more robust than using the G-computation formula.  We conclude with a methodology to construct DR estimates for a general data structure and prove that such DR estimates are more robust than their corresponding maximum likelihood estimates.</description>

<author>Romain Neugebauer</author>


<category>Statistical Models</category>

<category>Statistical Theory and Methods</category>

<category>Survival Analysis</category>

</item>


<item>
<title>Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples</title>
<link>http://works.bepress.com/mark_van_der_laan/180</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/180</guid>
<pubDate>Thu, 16 Nov 2006 02:39:35 PST</pubDate>
<description>In Part I of this article we propose a general 
cross-validation criterian
for selecting among a collection of estimators 
of a particular parameter
of interest based on n i.i.d. observations. 
It is assumed that the parameter of interest   minimizes the expectation
(w.r.t. to the distribution of the observed data structure)
of a particular loss function of a candidate parameter value and the observed
data structure, possibly indexed by a nuisance parameter.
The proposed cross-validation criterian is defined as
the empirical mean over the validation sample of the loss function
at the parameter estimate based on the training sample, averaged over
random splits of the observed sample.
The cross-validation  selector is now the estimator which minimizes this
cross-validation criterion.
We illustrate that this general methodology covers, in particular, the selection problems in the current literature, but results in a wide range of 
new selection methods.
We prove  a finite sample oracle inequality, and asymptotic optimality of the cross-validated
selector under general conditions. 
The asymptotic optimality states that the cross-validation selector performs
asymptotically exactly as well as the selector which for each given data
set makes the best choice (knowing the
true data generating distribution).Our general framework allows, in particular, the situation in which the observed data structure
is a censored  version of the full data structure of interest, and where the parameter of interest  is a parameter of the full data structure 
distribution.
As examples of the parameter of the full data distribution we consider
a density of (a part of) the full  data structure,
a conditional expectation of an  outcome, given explanatory variables,
 a marginal survival function of a failure time, and
multivariate conditional expectation of an outcome vector, given
covariates.
In part II of this article we show that the general estimating function methodology for censored data structures
as provided in van der Laan, Robins (2002) yields the wished loss functions for the selection among 
estimators of a full-data distribution parameter of interest based on censored
data. The corresponding cross-validation selector generalizes any of the
existing selection methods in regression and density estimation (including model selection) to the censored data case.
Under general conditions, our optimality results now show that the corresponing cross-validation
selector performs asymptotically exactly as well as the selector which for
each given data set makes the best choice (knowing the true full data distribution).In Part III of this article we  propose a general estimator which is defined as follows.
For a collection of subspaces and the complete parameter space, one
defines an epsilon-net (i.e., a finite set of points whose epsilon-spheres
cover the complete parameter space). 
For each epsilon and subspace one defines now a corresponding minimum 
cross-valided empirical risk estimator as the minimizer of cross-validated risk
over the subspace-specific epsilon-net.
In the special case that the loss function has no nuisance parameter, which thus covers the classical regression and density estimation cases, 
this epsilon and subspace specific minimum risk
estimator reduces to the minimizer
of the empirical risk over the corresponding epsilon-net.
Finally, one selects epsilon and the subspace with the cross-validation selector.
We refer to the resulting estimator as the cross-validated adaptive 
epsilon-net estimator. 
We prove an oracle inequality for this estimator which implies that the
estimator minimax adaptive in the sense that
it achieves the minimax optimal rate of convergence for the smallest
of the guessed  subspaces
containing the true parameter value. </description>

<author>Mark J. van der Laan</author>


<category>Computation</category>

<category>Statistical Models</category>

<category>Statistical Theory and Methods</category>

<category>Survival Analysis</category>

</item>


<item>
<title>Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples</title>
<link>http://works.bepress.com/mark_van_der_laan/179</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/179</guid>
<pubDate>Thu, 16 Nov 2006 02:39:33 PST</pubDate>
<description>In Part I of this article we propose a general 
cross-validation criterian
for selecting among a collection of estimators 
of a particular parameter
of interest based on n i.i.d. observations. 
It is assumed that the parameter of interest   minimizes the expectation
(w.r.t. to the distribution of the observed data structure)
of a particular loss function of a candidate parameter value and the observed
data structure, possibly indexed by a nuisance parameter.
The proposed cross-validation criterian is defined as
the empirical mean over the validation sample of the loss function
at the parameter estimate based on the training sample, averaged over
random splits of the observed sample.
The cross-validation  selector is now the estimator which minimizes this
cross-validation criterion.
We illustrate that this general methodology covers, in particular, the selection problems in the current literature, but results in a wide range of 
new selection methods.
We prove  a finite sample oracle inequality, and asymptotic optimality of the cross-validated
selector under general conditions. 
The asymptotic optimality states that the cross-validation selector performs
asymptotically exactly as well as the selector which for each given data
set makes the best choice (knowing the
true data generating distribution).Our general framework allows, in particular, the situation in which the observed data structure
is a censored  version of the full data structure of interest, and where the parameter of interest  is a parameter of the full data structure 
distribution.
As examples of the parameter of the full data distribution we consider
a density of (a part of) the full  data structure,
a conditional expectation of an  outcome, given explanatory variables,
 a marginal survival function of a failure time, and
multivariate conditional expectation of an outcome vector, given
covariates.
In part II of this article we show that the general estimating function methodology for censored data structures
as provided in van der Laan, Robins (2002) yields the wished loss functions for the selection among 
estimators of a full-data distribution parameter of interest based on censored
data. The corresponding cross-validation selector generalizes any of the
existing selection methods in regression and density estimation (including model selection) to the censored data case.
Under general conditions, our optimality results now show that the corresponing cross-validation
selector performs asymptotically exactly as well as the selector which for
each given data set makes the best choice (knowing the true full data distribution).In Part III of this article we  propose a general estimator which is defined as follows.
For a collection of subspaces and the complete parameter space, one
defines an epsilon-net (i.e., a finite set of points whose epsilon-spheres
cover the complete parameter space). 
For each epsilon and subspace one defines now a corresponding minimum 
cross-valided empirical risk estimator as the minimizer of cross-validated risk
over the subspace-specific epsilon-net.
In the special case that the loss function has no nuisance parameter, which thus covers the classical regression and density estimation cases, 
this epsilon and subspace specific minimum risk
estimator reduces to the minimizer
of the empirical risk over the corresponding epsilon-net.
Finally, one selects epsilon and the subspace with the cross-validation selector.
We refer to the resulting estimator as the cross-validated adaptive 
epsilon-net estimator. 
We prove an oracle inequality for this estimator which implies that the
estimator minimax adaptive in the sense that
it achieves the minimax optimal rate of convergence for the smallest
of the guessed  subspaces
containing the true parameter value. </description>

<author>Mark J. van der Laan</author>


<category>Loss-Based Estimation with Cross-Validation</category>

</item>


<item>
<title>Tree-based Multivariate Regression and Density Estimation with Right-Censored Data </title>
<link>http://works.bepress.com/mark_van_der_laan/178</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/178</guid>
<pubDate>Thu, 16 Nov 2006 02:39:31 PST</pubDate>
<description>We propose a unified strategy for estimator construction, selection,
and performance assessment in the presence of censoring. This approach
is entirely driven by the choice of a loss function for the full
(uncensored) data structure and can be stated in terms
of the following three main steps. (1) Define the parameter of interest
as the
minimizer of the expected loss, or risk, for a full data loss function
chosen to represent the desired measure of performance. Map
  the full data loss function into an observed (censored) data
loss function having the same expected value and leading
  to an efficient estimator of this risk. (2)
Construct candidate estimators based on the loss function for the
observed data. (3) Apply cross-validation to
estimate risk based on the observed data loss function and to select
an optimal estimator among the candidates. A number of common
estimation procedures follow this approach in the full data situation,
but depart from it when faced with the obstacle of evaluating the loss
function for censored observations. Here, we argue that one can,
and should, also adhere to this estimation road map in censored data
situations.Tree-based methods, where the
candidate estimators in Step 2 are
generated by recursive binary partitioning of a suitably defined
covariate space, provide a striking example of the chasm between
estimation procedures for full data and censored data  (e.g.,
regression trees as in CART for uncensored data and adaptations to
censored data).
Common approaches for regression trees bypass the risk estimation
problem
for censored outcomes by altering the node splitting and tree pruning
criteria
in manners that are specific to right-censored data.
This article describes an application of our unified
methodology to tree-based estimation with censored data.
The approach encompasses univariate prediction, multivariate
prediction, and density estimation, simply by defining a suitable loss
function for each of these problems. The proposed method for tree-based
estimation with censoring is evaluated using  simulation studies and
CGH copy number and survival data from breast cancer patients.
</description>

<author>Annette M. Molinaro</author>


<category>Loss-Based Estimation with Cross-Validation</category>

</item>


<item>
<title>Tree-based Multivariate Regression and Density Estimation with Right-Censored Data </title>
<link>http://works.bepress.com/mark_van_der_laan/177</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/177</guid>
<pubDate>Thu, 16 Nov 2006 02:39:29 PST</pubDate>
<description>We propose a unified strategy for estimator construction, selection,
and performance assessment in the presence of censoring. This approach
is entirely driven by the choice of a loss function for the full
(uncensored) data structure and can be stated in terms
of the following three main steps. (1) Define the parameter of interest
as the
minimizer of the expected loss, or risk, for a full data loss function
chosen to represent the desired measure of performance. Map
  the full data loss function into an observed (censored) data
loss function having the same expected value and leading
  to an efficient estimator of this risk. (2)
Construct candidate estimators based on the loss function for the
observed data. (3) Apply cross-validation to
estimate risk based on the observed data loss function and to select
an optimal estimator among the candidates. A number of common
estimation procedures follow this approach in the full data situation,
but depart from it when faced with the obstacle of evaluating the loss
function for censored observations. Here, we argue that one can,
and should, also adhere to this estimation road map in censored data
situations.Tree-based methods, where the
candidate estimators in Step 2 are
generated by recursive binary partitioning of a suitably defined
covariate space, provide a striking example of the chasm between
estimation procedures for full data and censored data  (e.g.,
regression trees as in CART for uncensored data and adaptations to
censored data).
Common approaches for regression trees bypass the risk estimation
problem
for censored outcomes by altering the node splitting and tree pruning
criteria
in manners that are specific to right-censored data.
This article describes an application of our unified
methodology to tree-based estimation with censored data.
The approach encompasses univariate prediction, multivariate
prediction, and density estimation, simply by defining a suitable loss
function for each of these problems. The proposed method for tree-based
estimation with censoring is evaluated using  simulation studies and
CGH copy number and survival data from breast cancer patients.
</description>

<author>Annette M. Molinaro</author>


<category>Human Genetics</category>

<category>Multivariate Analysis</category>

<category>Statistical Models</category>

<category>Statistical Theory and Methods</category>

<category>Survival Analysis</category>

</item>


<item>
<title>The Two-Interval Line-Segment Problem</title>
<link>http://works.bepress.com/mark_van_der_laan/176</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/176</guid>
<pubDate>Thu, 16 Nov 2006 02:39:27 PST</pubDate>
<description>In this paper, the NPMLE in the one-dimensional line segment problem is defined and studied, where line segments on the real line through two non-overlapping intervals are observed.  The self-consistency equations for the NPMLE are defined and a quick algorithm for solving them is provided.  Supnorm weak convergence to a Gaussian process and efficiency of the NPMLE is proved.  The problem has a strong geological application in the study of the lifespan of species.</description>

<author>Mark J. van der Laan</author>


</item>


<item>
<title>The NPMLE in the Uniform Doubly Censored Current Status Data Model</title>
<link>http://works.bepress.com/mark_van_der_laan/175</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/175</guid>
<pubDate>Thu, 16 Nov 2006 02:39:26 PST</pubDate>
<description>In biostatistical applications interest often focuses on the estimation of the distribution  of time T between two consecutive events.  If the initial event time is observed and the subsequent event time is only known to be larger or smaller than an observed point in time, then the data is described by the well understood singly censored current status model, also known as interval censored data, case I.  Jewell, Malani and Vittinghoff (1994) extended this current status model by allowing the initial time to be unobserved, but with its distribution over an observed interval [A,B] known to be uniformly distributed; the data is referred to as doubly censored current  status data.  These authors used this model to handle applications in AIDS partner studies  focusing on the nonparametirc maximum likelihood estimate (NPMLE) of the distribution function, G, of T.   The model is a submodel of the current status model, but G is essentially the derivative  of the distribution function of interest, F, in the current status model.  In this paper we establish that the NPMLE of G is uniformly consistent and that the resulting estimators for square root n estimable parameters are efficient.  We propose an iterative weighted Pool-Adjacent-Violator-Algorithm to compute the NPMLE of G.  The rate of convergence of the NPMLE of F is also established.  </description>

<author>Mark J. van der Laan</author>


<category>Survival Analysis</category>

</item>


<item>
<title>The NPMLE in the Uniform Doubly Censored Current Status Data Model</title>
<link>http://works.bepress.com/mark_van_der_laan/174</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/174</guid>
<pubDate>Thu, 16 Nov 2006 02:39:24 PST</pubDate>
<description>In biostatistical applications interest often focuses on the estimation of the distribution  of time T between two consecutive events.  If the initial event time is observed and the subsequent event time is only known to be larger or smaller than an observed point in time, then the data is described by the well understood singly censored current status model, also known as interval censored data, case I.  Jewell, Malani and Vittinghoff (1994) extended this current status model by allowing the initial time to be unobserved, but with its distribution over an observed interval [A,B] known to be uniformly distributed; the data is referred to as doubly censored current  status data.  These authors used this model to handle applications in AIDS partner studies  focusing on the nonparametirc maximum likelihood estimate (NPMLE) of the distribution function, G, of T.   The model is a submodel of the current status model, but G is essentially the derivative  of the distribution function of interest, F, in the current status model.  In this paper we establish that the NPMLE of G is uniformly consistent and that the resulting estimators for square root n estimable parameters are efficient.  We propose an iterative weighted Pool-Adjacent-Violator-Algorithm to compute the NPMLE of G.  The rate of convergence of the NPMLE of F is also established.  </description>

<author>Mark J. van der Laan</author>


<category>Statistical Theory and Methods</category>

</item>


<item>
<title>The Nonparametric Maximum Likelihood Estimator in a Class of Doubly Censored Current Status Data Models with Application to Partner Studies</title>
<link>http://works.bepress.com/mark_van_der_laan/173</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/173</guid>
<pubDate>Thu, 16 Nov 2006 02:39:22 PST</pubDate>
<description>The California Partners' Study is an ongoing investigation of heterosexual HIV transmission in partners of infected index cases (Padian, et al., 1987; Shiboski &amp; Jewell, 1990). In addition to the HIV-status of the partner at the recruiting time one also observes the initial time of the partnership and a lower bound for the infection time of the index case. Following Jewell, Malani &amp; Vittinghoff (1994) we assume that the infection time of the index case is uniformly distributed over the interval determined by the lower bound and the recruiting time, but no further assumptions are made. We consider an NPMLE of the distribution of the time T the partner is exposed to an infected index partner until infection. We show that the model is a doubly censored current status data model as introduced in Jewell, Malani &amp; Vittinghoff (1994) with a special known distribution of the origin. We provide a modified iterative Weighted Pool Adjacent Violator Algorithm for computation of the NPMLE. It is shown that the NPMLE converges.  In addition, we propose confidence intervals for smooth functionals of the distribution of T. Simulations show good performance of the algorithm, confidence intervals and provide a practical comparison of this NPMLE with the NPMLE if all partnerships are already in existence at the infection time of the index case as used in Shiboski &amp; Jewell (1990). We apply our methodology to the California Partners' Study. We discuss the implications of our results for doubly censored current status data models with other known distributions of the origin.  </description>

<author>Mark J. van der Laan</author>


<category>Statistical Theory and Methods</category>

</item>


<item>
<title>The Cross-Validated Adaptive Epsilon-Net Estimator</title>
<link>http://works.bepress.com/mark_van_der_laan/172</link>
<guid isPermaLink="true">http://works.bepress.com/mark_van_der_laan/172</guid>
<pubDate>Thu, 16 Nov 2006 02:39:19 PST</pubDate>
<description>Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space typically results in ill-defined or too variable estimators of the parameter of interest (i.e., the risk minimizer for the true data generating distribution).  In this article, we propose a cross-validated epsilon-net  estimation methodology that covers a broad class of estimation problems, including multivariate outcome prediction and multivariate density estimation. An epsilon-net sieve of a subspace of the parameter space is defined as a collection of finite sets of points, the epsilon-nets indexed by epsilon, which approximate the subspace up till a resolution of epsilon. Given a collection of subspaces of the parameter space, one constructs an epsilon-net sieve for each of the subspaces. For each choice of subspace and each value of the resolution epsilon, one defines a candidate estimator as the minimizer of the empirical risk over the corresponding epsilon-net. The cross-validated epsilon-net estimator is then defined as the candidate estimator corresponding to the choice of subspace and epsilon-value minimizing the cross-validated empirical risk. We derive a finite sample inequality which proves that the proposed estimator achieves the adaptive optimal minimax rate of convergence, where the adaptivity is achieved by considering epsilon-net sieves for various subspaces.  We also address the implementation of the cross-validated epsilon-net estimation procedure. In the context of a linear regression model, we present results of a preliminary simulation study comparing the cross-validated epsilon-net estimator to the cross-validated L^1-penalized least squares estimator (LASSO) and the least angle regression estimator (LARS). Finally, we discuss generalizations of the proposed estimation methodology to censored data structures. </description>

<author>Mark J. van der Laan</author>


<category>Statistical Theory and Methods</category>

<category>Survival Analysis</category>

</item>



</channel>
</rss>
