"Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable" by Myron Peto

Selected Works of Robert Jernigan

Follow Contact

Article

Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable

BMC Bioinformatics

Myron Peto, Iowa State University
Andrzej Kloczkowski, Iowa State University
Vasant Honavar, Iowa State University
Robert L. Jernigan, Iowa State University

Download Find in your library

Document Type

Article

Disciplines

Publication Version

Published Version

Publication Date

1-1-2008

DOI

10.1186/1471-2105-9-487

Abstract

Background

By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations. Results

First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly – or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms. Conclusion

By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy – in some cases exceeding 95%.

Comments

This article is published as Peto, Myron, Andrzej Kloczkowski, Vasant Honavar, and Robert L. Jernigan. "Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable." BMC bioinformatics 9, no. 1 (2008): 487. doi: 10.1186/1471-2105-9-487. Posted with permission.

Creative Commons License

Creative Commons Attribution 4.0 International

Peto et al

2008

Language

File Format

application/pdf

Citation Information

Myron Peto, Andrzej Kloczkowski, Vasant Honavar and Robert L. Jernigan. "Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable" BMC Bioinformatics Vol. 9 Iss. 1 (2008) p. 487
Available at: http://works.bepress.com/robert-jernigan/59/