Covariance models (CMs) are a very sensitive tool for finding non-coding RNA (ncRNA) genes in DNA sequence data. However, CMs are extremely slow. One reason why CMs are so slow is that they allow all possible combinations of insertions and deletions relative to the consensus model even though the vast majority of these are never seen in practice. In this paper we examine reduction in the number of states in covariance models. A simplified CM with reduced states which can be scored much faster is introduced. A comparison of the results of a full CM versus a reduced-state model found using a genetic algorithm is given for the let7 ncRNA family.
This document was originally published by IEEE in IEEE Congress on Evolutionary Computation, 2006. Copyright restrictions may apply. DOI: 10.1109/CEC.2006.1688650
Available at: http://works.bepress.com/jennifer_smith/5/