Motivation: Identication of transcription factor binding sites (TFBS) is a fundamental problem in understanding the mechanism of gene regulation. The ChIP-chip technology has accelerated this eort by providing a simultaneous genome-wide map of TFBS in a high-throughput fashion. Recently, a sequencing-based ChIP-seq has appeared as a promising alternative that can identify targets with an improved sensitivity/specicity in high resolution. However, studies have suggested that distinct experimental platforms can be complementary in TFBS identication. The availability of data obtained from multiple platforms motivates a meta-analysis for improved identication of candidate motifs.
Results: In this work, we propose a hierarchical hidden Markov model (HHMM) that combines signals from ChIP-seq and ChIP-chip experiments with a dierential weighting that assigns higher con- dence to the ChIP-seq signals relative to the ChIP-chip signals. A simulation study reveals that HHMM controls the false positive rates lower than other methods while it identies motif-containing regions with higher sensitivity. When HHMM was applied to real datasets for two TFs (NRSF and CTCF), HHMM selected peak regions with the highest statistical signicance of motif enrichment compared to those identied by other methods, indicating lower false discovery rates. In both examples, HHMM also identied more than two thirds of TFBS motifs found by the Union method, indicating a high sensitivity of HHMM.
- false discovery rate,
- genomic data integration,
- mixture model,
- transcription factor binding site
Available at: http://works.bepress.com/debashis_ghosh/33/