We explore whether protein-RNA interfaces differ from non-interfaces in terms of their structural features and whether structural features vary according to the type of the bound RNA (e.g., mRNA, siRNA, etc.), using a non-redundant dataset of 147 protein chains extracted from protein-RNA complexes in the Protein Data Bank. Furthermore, we use machine learning algorithms for training classifiers to predict protein-RNA interfaces using information derived from the sequence and structural features. We develop the Struct-NB classifier that takes into account structural information. We compare the performance of Naïve Bayes and Gaussian Naïve Bayes with that of Struct-NB classifiers on the 147 protein-RNA dataset using sequence and structural features respectively as input to the classifiers. The results of our experiments show that Struct-NB outperforms Naïve Bayes and Gaussian Naïve Bayes on the problem of predicting the protein-RNA binding interfaces in a protein sequence in terms of a range of standard measures for comparing the performance of classifiers.
Available at: http://works.bepress.com/drena-dobbs/38/
This is a manuscript of an article from International Journal of Data Mining and Bioinformatics 4 (2010): 21, doi: 10.1504/IJDMB.2010.030965. Posted with permission.