Jeffrey S. Morris Copyright (c) 2008 All rights reserved. http://works.bepress.com/jeffrey_s_morris Recent documents in Jeffrey S. Morris en-us Sun, 24 Aug 2008 13:25:40 PDT 3600 Statistical Issues in Proteomic Research http://works.bepress.com/jeffrey_s_morris/37 http://works.bepress.com/jeffrey_s_morris/37 Thu, 31 Jul 2008 11:03:24 PDT Jeffrey S. Morris Functional Data Analysis Proteomics Microproteomics: Analysis of protein diversity in small samples http://works.bepress.com/jeffrey_s_morris/36 http://works.bepress.com/jeffrey_s_morris/36 Fri, 13 Jun 2008 14:38:53 PDT Proteomics, the large-scale study of protein expression in organisms, offers the potential to evaluate global changes in protein expression and their post-translational modifications that take place in response to normal or pathological stimuli. One challenge has been the requirement for substantial amounts of tissue in order to perform comprehensive proteomic characterization. In heterogeneous tissues, such as brain, this has limited the application of proteomic methodologies. Efforts to adapt standard methods of tissue sampling, protein extraction, arraying, and identification are reviewed, with an emphasis on those appropriate to smaller samples ranging in size from several microliters down to single cells. The effects of miniaturization on these analyses are highlighted using neuroscience-related examples, as are statistical issues unique to the high-dimensional datasets generated by proteomic experiments. Howard B. Gutstein Functional Data Analysis Proteomics Pinnacle: A Fast, Automatic Method for Detecting and Quantifying Protein Spots in 2-Dimensional Gel Electrophoresis Data http://works.bepress.com/jeffrey_s_morris/35 http://works.bepress.com/jeffrey_s_morris/35 Tue, 04 Dec 2007 09:44:53 PST Motivation: One of the key limitations for proteomic studies using 2-dimensional gel electrophoresis (2DE) is the lack of rapid, robust, and reproducible methods for detecting, matching, and quantifying protein spots. The most commonly used approaches involve first detecting spots and drawing spot boundaries on individual gels, then matching spots across gels, and finally quantifying each spot by calculating normalized spot volumes. This approach is time con-suming, error-prone, and frequently requires extensive manual edit-ing, which can unintentionally introduce bias into the results.Results: We introduce a new method for spot detection and quanti-fication called Pinnacle that is automatic, quick, sensitive and spe-cific, and yields spot quantifications that are reliable and precise. This method incorporates a spot definition that is based on simple, straightforward criteria rather than complex arbitrary definitions, and results in no missing data. Using dilution series for validation, we demonstrate Pinnacle outperformed two well-established 2DE analysis packages, proving to be more accurate and yielding smaller CVs. More accurate quantifications may lead to increased power for detecting differentially expressed spots, an idea supported by the results of our group comparison experiment. Our fast, automatic analysis method makes it feasible to conduct very large 2DE-based proteomic studies that are adequately powered to find important protein expression differences.Availability: Matlab code to implement Pinnacle is available from the authors upon request for non-commercial use. Jeffrey S. Morris Functional Data Analysis Proteomics Laser capture sampling and analytical issues in proteomics http://works.bepress.com/jeffrey_s_morris/34 http://works.bepress.com/jeffrey_s_morris/34 Tue, 04 Dec 2007 09:35:54 PST Proteomics holds the promise of evaluating global changes in protein expression and post-translational modificaiton in response to environmental stimuli. However, difficulties in achieving cellular anatomic resolution and extracting specific types of proteins from cells have limited the efficacy of these techniques. Laser capture microdissection has provided a solution to the problem of anatomical resolution in tissues. New extraction methodologies have expanded the range of proteins identified in subsequent analyses. This review will examine the application of laser capture microdissection to proteomic tissue sampling, and subsequent extraction of these samples for differential expression analysis. Statistical and other quantitative issues important for the analysis of the highly complex datasets generated are also reviewed. Howard Gutstein Proteomics Statistical contributions to proteomic research http://works.bepress.com/jeffrey_s_morris/33 http://works.bepress.com/jeffrey_s_morris/33 Wed, 04 Apr 2007 12:55:09 PDT Proteomic profiling has the potential to impact the diagnosis, prognosis, and treatment of various diseases. A number of different proteomic technologies are available that allow us to look at many proteins at once, and all of them yield complex data that raise significant quantitative challenges. Inadequate attention to these quantitative issues can prevent these studies from achieving their desired goals, and can even lead to invalid results. In this chapter, we describe various ways the involvement of statisticians or other quantitative scientists in the study team can contribute to the success of proteomic research, and we outline some of the key statistical principles that should guide the experimental design and analysis of such studies. Jeffrey S. Morris Proteomics Wavelet-based functional mixed model analysis: Computational considerations http://works.bepress.com/jeffrey_s_morris/32 http://works.bepress.com/jeffrey_s_morris/32 Wed, 04 Apr 2007 12:48:45 PDT Wavelet-based Functional Mixed Models is a new Bayesian method extending mixed models to irregular functional data (Morris and Carroll, JRSS-B, 2006). These data sets are typically very large and can quickly run into memory and time constraints unless these issues are carefully dealt with in the software. We reduce runtime by 1.) identifying and optimizing hotspots, 2.) using wavelet compression to do less computation with minimal impact on results, and 3.) dividing the code into multiple executables to be run in parallel using a grid computing resource. We discuss rules of thumb for estimating memory requirements and computation times in terms of model and data set parameters. We present examples and benchmarks demonstrating that it is practical to analyze very large data sets with readily available computing resources. This code is freely available on our website. Richard C. Herrick Functional Data Analysis Parametric and Nonparametric Methods for Understanding the Relationship Between Carcinogen-Induced DNA Adduct Levels in Distal and Proximal Regions of the Colon. http://works.bepress.com/jeffrey_s_morris/31 http://works.bepress.com/jeffrey_s_morris/31 Thu, 14 Dec 2006 14:30:55 PST An important problem in studying the etiology of colon cancer is understanding the relationship between DNA adduct levels (broadly, DNA damage) in cells within colonic crypts in distal and proximal parts of the colon, following treatment with a carcinogen and different types of diet. In particular, it is important to understand whether rats who have elevated adduct levels in particular positions in distal region crypts also have elevated levels in the same positions of the crypts in proximal regions, and whether this relationship depends on diet. We cast this problem as estimating the correlation function of two responses as a function of a covariate for studies where both responses are measured on the same experimental units but not the same subsampling units. Parametric and nonparametric methods are developed and applied to a dataset from an ongoing study, leading to potentially important and surprising biological results. Theoretical calculations suggest that the nonparametric method, based on nonparametric regression, should in fact have statistical properties nearly the same as if the functions nonparametrically estimated were known. The methodology used in this article can be applied to other settings when the goal of the study is to model the correlation of two continuous repeated measurement responses as a function of a covariate, whereas the two responses of interest can be measured on the same experimental units but not on the same subsampling units. In our example, the two responses were measured in two different regions of the colon. Jeffrey S. Morris Functional Data Analysis The BLUPs Are Not "Best" When It Comes To Bootstrapping http://works.bepress.com/jeffrey_s_morris/30 http://works.bepress.com/jeffrey_s_morris/30 Thu, 14 Dec 2006 14:27:45 PST In the setting of mixed models, some researchers may construct a semiparametric bootstrap by sampling from the best linear unbiased predictor residuals. This paper demonstrates both mathematically and by simulation that such a bootstrap will consistently underestimate the variation in the data in finite samples. Jeffrey S. Morris Statistical Methods: Bootstrap A Bayesian Analysis Involving Colonic Crypt Structure and Coordinated Response to Carcinogens Incorporating Missing Crypts http://works.bepress.com/jeffrey_s_morris/29 http://works.bepress.com/jeffrey_s_morris/29 Thu, 14 Dec 2006 14:25:47 PST This paper is concerned with modeling the architecture of colonic crypts and the implications of this modeling for understanding possible coordinated response of carcinogen-induced DNA damage between various regions of the colon. The methods we develop to address these two issues are applied to a particular important example in colon carcinogenesis. We cast the problem as an unusual and not previously studied hierarchical mixed-effects model characterized by completely missing covariates in units at a structurally base level, except for some randomly selected units. Information concerning the missing covariates is available through certain known ordering constraints and surrogate measures. Our methods use Bayesian machinery. We exploit the biological structure of this problem to generate the missing covariates simultaneously and efficiently at the base levels, as opposed to the naive practice of generating units at the base levels one-at-a-time with Metropolis-Hastings steps.We apply our methods to show that different regions of the colon have different architectures, and to estimate an important but nonstandard function that measures the interrelationship of DNA damage mechanisms in different regions of the colon. Jeffrey S. Morris Functional Data Analysis Differential Expression in SAGE: accounting for normal between-library variation http://works.bepress.com/jeffrey_s_morris/28 http://works.bepress.com/jeffrey_s_morris/28 Thu, 14 Dec 2006 14:22:14 PST Motivation: In contrasting levels of gene expression between groups of SAGE libraries, the libraries within each group are often combined and the counts for the tag of interest summed, and inference is made on the basis of these larger 'pseudolibraries'. While this captures the sampling variability inherent in the procedure, it fails to allow for normal variation in levels of the gene between individuals within the same group, and can consequently overstate the significance of the results. The effect is not slight: between-library variation can be hundreds of times the within-library variation.Results: We introduce a beta-binomial sampling model that correctly incorporates both sources of variation. We show how to fit the parameters of this model, and introduce a test statistic for differential expression similar to a twosample t-test.Contact: kabagg@mdanderson.orgSupplementary information: http://bioinformatics. mdanderson.org/ Includes Matlab and R code for fitting the model. Keith A. Baggerly Genomics