Protein structure prediction has been a very important and challenging research problem in bioinformatics for years. Yet the determination of protein structures by time-consuming and relatively expensive experimental methods continues to lag far behind the explosive discovery of protein sequences. With the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of the best computational prediction methods has finally exceeded 80%. Herein we present a rule-based data-mining approach called BLAST-RT-RICO (Relaxed Threshold Rule Induction from Coverings) that utilizes multiple sequence alignment information to predict protein secondary structure. This method uses the PSI-BLAST algorithm to identify suitable proteins, and then generates rules from these proteins that can be used to predict secondary structure. By also utilizing known homologous template secondary structures in the Protein Data Bank (PDB) database, BLAST-RT-RICO achieved a Q3 score of 89.93% on the standard test dataset RS126 and a Q3 score of 87.71% on the standard test dataset CB396. These successful preliminary results suggest that this rule-based method may be the foundation for even more accurate prediction of protein secondary structure in the future.
- BLAST,
- Data Mining,
- Protein Secondary Structure Prediction
Available at: http://works.bepress.com/ronald-frank/9/