A genetic algorithm is used to search for a set of classification features for a protein superfamily which is as unique as possible to the superfamily. These features may then be used for very fast classification of a query sequence into a protein superfamily. The features are based on windows onto modified consensus sequences of multiple aligned members of a training set for the protein superfamily. The efficacy of the method is demonstrated using receiver operating characteristic (ROC) values and the performance of resulting algorithm is compared with other database search algorithms.
This document was originally published by IEEE in The 2005 IEEE Congress on Evolutionary Computation. Copyright restrictions may apply. DOI: 10.1109/CEC.2005.1554744
Available at: http://works.bepress.com/jennifer_smith/6/