The profile hidden Markov model (HMM) is a powerful method for remote homolog database search. However, evaluating the score of each database sequence against a profile HMM is computationally demanding. The computation time required for score evaluation is proportional to the number of states in the profile HMM. This paper examines whether the number of states can be truncated without reducing the ability of the HMM to find proteins containing members of a protein domain family. A genetic algorithm (GA) is presented which finds a good truncation of the HMM states. The results of using truncation on searches of the yeast, E. coli, and pig genomes for several different protein domain families is shown.
This document was originally published by IEEE in Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology. Copyright restrictions may apply. DOI: 10.1109/CIBCB.2005.1594926
Available at: http://works.bepress.com/jennifer_smith/4/