<?xml version="1.0" encoding="iso-8859-1" ?>
<rss version="2.0">
<channel>
<title>Hongzhe Li</title>
<copyright>Copyright (c) 2009  All rights reserved.</copyright>
<link>http://works.bepress.com/hongzhe_li</link>
<description>Recent documents in Hongzhe Li</description>
<language>en-us</language>
<lastBuildDate>Sun, 31 May 2009 06:34:19 PDT</lastBuildDate>
<ttl>3600</ttl>





<item>
<title>Survival Analysis Methods in Genetic Epidemiology</title>
<link>http://works.bepress.com/hongzhe_li/12</link>
<guid isPermaLink="true">http://works.bepress.com/hongzhe_li/12</guid>
<pubDate>Thu, 26 Oct 2006 10:48:26 PDT</pubDate>
<description>Mapping genes for complex human diseases is a challenging problem due to the fact that many such diseases are due to both genetic and enviromental risk factors and many also exhibit phenotypic heterogeneity, such as variable age of onset. Information on variable age of disease onset is often a good indicator for disease heterogeneity and incorporation of such information together with enviromental risk factors into genetic  analysis should lead to more powerful tests for genetic analysis. Due to the problem of censoring, survival analysis methods have proved to be very useful for genetic analysis. In this paper, I review some recent methodological developments on integrating modern survival analysis methods and human genetics in order to rigorously incorporate both age of onset and enviromental covariates data into aggregation analysis, segregation analysis, linkage analysis, association analysis and gene risk characterization. I also briefly discuss the issue of ascertainment correction and survival analysis methods for high-dimensional genomic data. Finally, I outline several areas that need further methodological developments.</description>

<author>Hongzhe Li</author>


<category>Survival Analysis</category>

</item>


<item>
<title>Penalized Cox Regression Analysis in the High-Dimensional and Low-sample Size Settings, with Applications to Mi-croarray Gene Expression Data</title>
<link>http://works.bepress.com/hongzhe_li/11</link>
<guid isPermaLink="true">http://works.bepress.com/hongzhe_li/11</guid>
<pubDate>Thu, 26 Oct 2006 10:48:25 PDT</pubDate>
<description>An important application of microarray technology is to relate gene expression profiles to various clinical phenotypes of patients. Success has been demonstrated in molecular classification of cancer in which the gene expression data serve as predictors and different types of cancer serve as a categorical outcome variable. However, there has been less research in linking gene expression profiles to the censored survival data such as patients' overall survival time or time to cancer relapse. Due to large variability in time to certain clinical event among patients, studying possibly censored survival phenotypes can be more informative than treating the phenotypes as categorical variables.  We propose to use the L1 penalized estimation for the Cox model to select genes that are relevant to patients' survival and to build a predictive model for future prediction. The computational difficulty associated with the estimation in the high-dimensional and low-sample size settings can be efficiently solved by using the latest developed least angle regression method.  Results from our simulation studies and application to real data set on predicting survival after chemotherapy for patients with diffuse large B-cell lymphoma demonstrate that the proposed procedure, which we call the LARS-Lasso procedure, can be used for identifying important genes that are related to time to death due to cancer and for building a parsimonious model for predicting the survival of future patients. The LARS-Lasso regression gives much better predictive performance than the L2 penalized regression or dimension-reduction based methods such as the partial Cox regression method. </description>

<author>Jiang Gui</author>


<category>Human Genetics</category>

<category>Microarrays</category>

<category>Statistical Models</category>

<category>Survival Analysis</category>

</item>


<item>
<title>Nonparametric Pathway-Based Regression Models for Analysis of Genomic Data</title>
<link>http://works.bepress.com/hongzhe_li/9</link>
<guid isPermaLink="true">http://works.bepress.com/hongzhe_li/9</guid>
<pubDate>Thu, 26 Oct 2006 10:48:24 PDT</pubDate>
<description>High-throughout genomic data provide an opportunity for identifying pathways and genes that are related to various clinical phenotypes. Besides these genomic data, another valuable source of data is the biological knowledge about genes and pathways that might be related to the phenotypes of many complex diseases. Databases of such knowledge are often called the metadata. In microarray data analysis, such metadata are currently explored in post hoc ways by gene set enrichment analysis but have hardly been utilized in the modeling step. We propose to develop and evaluate a pathway-based gradient descent boosting procedure for nonparametric pathways-based regression(NPR) analysis to efficiently integrate genomic data and metadata. Such NPR models consider multiple pathways simultaneously and allow complex interactions among genes within the pathways and can be applied to identify pathways and genes within pathways that are related to variations of the phenotypes. These methods also provide an alternative to mediating the problem of a large number of potential interactions by limiting analysis to biologically plausible interactions between genes in related pathways. Our simulation studies indicate that the proposed boosting procedure can indeed identify relevent pathways and genes within pathways. Application to a gene expression data set on breast cancer distant matastasis identified that Wnt,  apoptosis and cell cycle regulated pathways are more likely related to the risk of distant metastasis among lymph-node-negative breast cancer patients. We also observed that by incorporating the pathway information, we achieved better prediction for cancer recurrence.</description>

<author>Zhi Wei</author>


<category>Computational Biology/Bioinformatics</category>

</item>


<item>
<title>Partial Cox Regression Analysis for High-Dimensional Microarray Gene Expression Data</title>
<link>http://works.bepress.com/hongzhe_li/10</link>
<guid isPermaLink="true">http://works.bepress.com/hongzhe_li/10</guid>
<pubDate>Thu, 26 Oct 2006 10:48:24 PDT</pubDate>
<description>An important application of microarray technology is to predict various clinical phenotypes based on the gene expression profile. Success has been demonstrated in molecular classification of cancer in which different types of cancer serve as categorical outcome variable. However, there has been less research in linking gene expression profile to censored survival outcome such as patients' overall survival time or time to cancer relapse. In this paper, we develop a partial Cox regression method for constructing mutually uncorrelated components based on microarray gene expression data for predicting the survival of future patients.  The proposed partial Cox regression method involves constructing predictive components by repeated least square fitting of residuals and Cox regression fitting. The key difference from the standard principal components Cox regression analysis is that in constructing the predictive components, our method utilizes the observed survival/censoring information. We also propose to apply the time dependent receiver operating characteristic curve analysis to evaluate the results. We applied our methods to a publicly available data set of diffuse large B-cell lymphoma.  The results indicated that combining the partial Cox regression method with principal components analysis results in parsimonious model with fewer components and better predictive performance. We conclude that the proposed partial Cox regression method can be very useful in building a parsimonious predictive model that can accurately predict the survival of future patients based on the gene expression profile and survival times of previous patients.</description>

<author>Hongzhe Li</author>


</item>


<item>
<title>Group Additive Regression Models for Genomic Data Analysis</title>
<link>http://works.bepress.com/hongzhe_li/8</link>
<guid isPermaLink="true">http://works.bepress.com/hongzhe_li/8</guid>
<pubDate>Thu, 26 Oct 2006 10:48:23 PDT</pubDate>
<description>One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group features that are related to the phenotypes. In addition, the prediction mean square errors are also smaller than the component-wise boosting procedure. We demonstrate the application of the methods to pathway-based analysis of microarray gene expression data of breast cancer and gene-based genetic association analysis of type 1 diabetes. Results from analysis of two breast cancer data sets indicate that the pathways of Metalloendopeptidases (MMPs) and MMP inhibitors, as well as cell proliferation, cell growth and maintenance are important to breast cancer relapse and survival. Results from analysis of a set of nonsynonymous SNPs on chromosome 6 confirmed a few genes that are associated with type 1 diabetes.</description>

<author>Yihui Luan</author>


<category>Computational Biology/Bioinformatics</category>

</item>


<item>
<title>Gradient Directed Regularization for Sparse Gaussian Concentration Graphs, with Applications to Inference of Genetic Networks</title>
<link>http://works.bepress.com/hongzhe_li/7</link>
<guid isPermaLink="true">http://works.bepress.com/hongzhe_li/7</guid>
<pubDate>Thu, 26 Oct 2006 10:48:22 PDT</pubDate>
<description>Large-scale microarray gene expression data provide the possibility of constructing genetic networks or biological pathways. Gaussian graphical models have been suggested to provide an effective method for constructing such genetic networks. However, most of the available methods for constructing Gaussian graphs do not account for the sparsity of the networks and are computationally more demanding or infeasible, especially in the settings of high-dimension and low sample size. We introduce a threshold gradient descent regularization procedure for estimating the sparse precision matrix in the setting of Gaussian graphical models and demonstrate its application to identifying genetic networks. Such a procedure is computationally feasible and can easily incorporate prior biological knowledge about the network structure. Simulation results indicate that the proposed method yields a better estimate of the precision matrix than the procedures that fail to account for the sparsity of the graphs. We also present the results on inference of a gene network for isoprenoid biosynthesis in Arabidopsis thaliana. These results demonstrate that the proposed procedure can indeed identify biologically meaningful genetic networks based on microarray gene expression data.</description>

<author>Hongzhe Li</author>


<category>Microarrays</category>

</item>


<item>
<title>Functional Empirical Bayes Methods for Identifying Genes with Different Time-course Expression Profiles</title>
<link>http://works.bepress.com/hongzhe_li/6</link>
<guid isPermaLink="true">http://works.bepress.com/hongzhe_li/6</guid>
<pubDate>Thu, 26 Oct 2006 10:48:21 PDT</pubDate>
<description>Time course studies of gene expression are essential in biomedical research to understand biological phenomena that evolve in a temporal fashion. Microarray technology makes it possible to study genome-wide temporal differences in gene expression profiles between different experimental conditions/groups. In this paper, we introduce a functional hierarchical model and empirical Bayes approach to model gene expression trajectories over time and to detect temporally differentially expressed (TDE) genes. Monte Carlo EM algorithm is developed for estimating both the gene-specific parameters and the hyperparameters.  We use the posterior probability based false discovery rate (FDR) criterion to identify the TDE genes in order to control for the over FDR. We illustrate the methods by using both simulated data sets and a data set from a microarray based gene expression time course study of C. elegans developmental processes. Simulation results suggested that the procedure have low false discovery rate but could potentially have high false negative rate when the noise variance is relatively large. Results from both simulations and analysis of C. elegans data indicated that the procedure performed better than the two-way ANOVA in identifying TDE genes between the dauer exit process and starved L1 worms response to feeding process.</description>

<author>Fangxin Hong</author>


<category>Clinical Epidemiology</category>

<category>Computation</category>

<category>Human Genetics</category>

<category>Microarrays</category>

<category>Statistical Models</category>

</item>


<item>
<title>Dimension Reduction Methods for Microarrays with Application to Censored Survival Data</title>
<link>http://works.bepress.com/hongzhe_li/5</link>
<guid isPermaLink="true">http://works.bepress.com/hongzhe_li/5</guid>
<pubDate>Thu, 26 Oct 2006 10:48:20 PDT</pubDate>
<description>Recent research has shown that gene expression profiles can potentially be used for predicting phenotypes such as cancer types and survival time in biomedical research. Microarray technology which simultaneously measures expression values of thousands of genes provides a powerful tool as well as new challenges in relating gene expression profiles to phenotypes. Expression data are often very high-dimensional, which makes statistical modeling more difficult and complex, especially when the phenotypes such as time to death or cancer recurrence are subject to right censoring.  We consider in this paper a model-free sufficient dimension reduction technique to reduce the dimension of microarray data in the context of analyzing censored survival data. We propose a dimension reduction technique which does not assume a particular model for survival time given gene expression values. After dimension reduction, the constructed gene expression components are used as covariates for predicting the survival probabilities in the framework of censored data regression analysis. In particular we use the popular Cox proportional hazards model to build a predictive model for survival.  We demonstrate the use of the methodology by applying to a large diffuse large B-cell lymphoma gene expression data set, which consists of 240 patients and 7399 genes. The Cox proportional hazards model with the derived gene expression components is shown to provide a good predictive performance for patient's survival as demonstrated by the receiver operator characteristics analysis. The predictive model built using the training data set predicted highly significant survival difference in the testing data. </description>

<author>Lexin Li</author>


<category>Computational Biology/Bioinformatics</category>

<category>Microarrays</category>

<category>Multivariate Analysis</category>

<category>Survival Analysis</category>

</item>


<item>
<title>Censored Data Regression in High-Dimension and Low-Sample Size Settings For Genomic Applications</title>
<link>http://works.bepress.com/hongzhe_li/4</link>
<guid isPermaLink="true">http://works.bepress.com/hongzhe_li/4</guid>
<pubDate>Thu, 26 Oct 2006 10:48:19 PDT</pubDate>
<description>New high-throughput technologies are generating various types of high-dimensional genomic and proteomic data and meta-data (e.g., networks and pathways) in order to obtain a systems-level understanding of various complex diseases such as human cancers and cardiovascular diseases. As the amount and complexity of the data increase and as the questions being addressed become more sophisticated, we face the great challenge of how to model such data in order to draw valid statistical and biological conclusions. One important problem in genomic research is to relate these high-throughput genomic data to various clinical outcomes, including possibly censored survival outcomes such as age at disease onset or time to cancer recurrence. We review some recently developed methods for censored data regression in the high-dimension and low-sample size setting, with emphasis on applications to genomic data. These methods include dimension reduction-based methods, regularized estimation methods such as Lasso and threshold gradient descent method, gradient descent boosting methods and nonparametric pathways-based regression models. These methods are demonstrated and compared by analysis of a data set of microarray gene expression profiles of 240 patients with diffuse large B-cell lymphoma together with follow-up survival information. Areas of further research are also presented.</description>

<author>Hongzhe Li</author>


<category>Survival Analysis</category>

</item>


<item>
<title>Ascertainment-Adjusted Maximum Likelihood Estimation for the Additive Genetic Gamma Frailty Model</title>
<link>http://works.bepress.com/hongzhe_li/3</link>
<guid isPermaLink="true">http://works.bepress.com/hongzhe_li/3</guid>
<pubDate>Thu, 26 Oct 2006 10:48:18 PDT</pubDate>
<description>The additive genetic gamma frailty model has been proposed for genetic linkage analysis for complex diseases to account for variable age of onset and possible covariates effects. To avoid ascertainment biases in parameter estimates, retrospective likelihood ratio tests are often used, which may result in loss of efficiency due to conditioning. This paper considers when the sibships are ascertained by having at least two affected sibs with the disease before a given age and provides two approaches for estimating the parameters in the additive gamma frailty model. One approach is based on the likelihood function conditioning on the ascertainment event, the other is based on maximizing a full ascertainment-adjusted likelihood. Explicit forms for these likelihood functions are derived. Simulation studies indicate that when the baseline hazard function can be correctly pre-specified, both approaches give accurate estimates of the model parameters. However, when the baseline hazard function has to be estimated simultaneously, only the ascertainment-adjusted likelihood method gives an unbiased estimate of the parameters. These results imply that the ascertainment-adjusted likelihood ratio test in the context of the additive genetic gamma frailty may be used for genetic linkage analysis.</description>

<author>Wanlong Sun</author>


<category>Human Genetics</category>

<category>Statistical Models</category>

<category>Statistical Theory and Methods</category>

</item>



</channel>
</rss>
