Shrinkage Estimation for SAGE Data using a Mixture Dirichlet Prior
Abstract
Serial Analysis of Gene Expression (SAGE) is a technique for estimating the gene expression profile of a biological sample. Any efficient inference in SAGE must be based upon efficient estimates of these gene expression profiles, which consist of the estimated relative abundances for each mRNA species present in the sample. The data from SAGE experiments are counts for each observed mRNA species, and can be modeled using a multinomial distribution with two characteristics: skewness in the distribution of relative abundances and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample will fail to capture a large number of expressed mRNA species present in the tissue. Standard empirical estimates of the relative abundances effectively ignore these missing, unobserved species, and consequently tend to also overestimate the abundance of the scarce observed species comprising a vast majority of the total. In this chapter, we review a new Bayesian procedure that yields improved estimates for the missing and scarce species without trading off much effciency for the abundant species. The key to the procedure is the mixture Dirichlet prior, which stochastically partitions the mRNA species into abundant and scarce strata, with each stratum modeled with its own multivariate prior, a scalar multiple of a symmetric Dirichlet. Simulation studies demonstrate that the resulting shrinkage estimators have eciency advantages over the MLE for SAGE scenarios simulated.
Suggested Citation
Jeffrey S. Morris, Keith A. Baggerly, and Kevin R. Coombes. "Shrinkage Estimation for SAGE Data using a Mixture Dirichlet Prior" Bayesian Inferene for Gene Expression and Proteomics. Ed. KA Do, P Mueller, M Vannucci. New York: Cambridge University Press, 2006. 254-267.
Available at: http://works.bepress.com/jeffrey_s_morris/14