In the analysis of data from proteomic mass spectrometry experiments, an important issue is determining which of the observed peptide spectrum matches (PSMs) represent true positives. We view this problem through a multiple testing framework and develop procedures for deciding true PSMs. A key feature that makes the problem relative unique to the differential expression problem in microarray analysis is that the null distribution can potentially be estimated from the data. However, this renders much of the asymptotic results from the statistical literature to be invalid. We prove some new key results for this problem using empirical process theory. We also develop a new multiple testing procedure that employs multivariate information from the peptide sequence searches. The proposed methods are studied using a real data set as well as simulated data.
- Benjamini–Hochberg procedure,
- Dimension reduction,
- False discovery rate,
- High-dimensional data,
- Multiple comparisons,
- Simultaneous inference
Available at: http://works.bepress.com/debashis_ghosh/46/