Skip to main content
Article
Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable
BMC Bioinformatics
  • Myron Peto, Iowa State University
  • Andrzej Kloczkowski, Iowa State University
  • Vasant Honavar, Iowa State University
  • Robert L. Jernigan, Iowa State University
Document Type
Article
Publication Version
Published Version
Publication Date
1-1-2008
DOI
10.1186/1471-2105-9-487
Abstract

Background

By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations. Results

First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly – or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms. Conclusion

By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy – in some cases exceeding 95%.

Comments

This article is published as Peto, Myron, Andrzej Kloczkowski, Vasant Honavar, and Robert L. Jernigan. "Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable." BMC bioinformatics 9, no. 1 (2008): 487. doi: 10.1186/1471-2105-9-487. Posted with permission.

Creative Commons License
Creative Commons Attribution 4.0 International
Copyright Owner
Peto et al
Language
en
File Format
application/pdf
Citation Information
Myron Peto, Andrzej Kloczkowski, Vasant Honavar and Robert L. Jernigan. "Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable" BMC Bioinformatics Vol. 9 Iss. 1 (2008) p. 487
Available at: http://works.bepress.com/robert-jernigan/59/