Skip to main content
Article
Using machine learning to improve ensemble docking for drug discovery.
Proteins (2020)
  • Tanay Chandak, University of Missouri–St. Louis
  • John P. Mayginnes, University of Missouri–St. Louis
  • Howard Mayes, University of Missouri–St. Louis
  • Chung F. Wong, University of Missouri–St. Louis
Abstract
Ensemble docking has provided an inexpensive method to account for receptor flexibility in molecular docking for virtual screening. Unfortunately, as there is no rigorous theory to connect the docking scores from multiple structures to measured activity, researchers have not yet come up with effective ways to use these scores to classify compounds into actives and inactives. This shortcoming has led to the decrease, rather than an increase in the performance of classifying compounds when more structures are added to the ensemble. Previously, we suggested machine learning, implemented in the form of a naïve Bayesian model could alleviate this problem. However, the naïve Bayesian model assumed that the probabilities of observing the docking scores to different structures to be independent. This approximation might prevent it from achieving even higher performance. In the work presented in this paper, we have relaxed this approximation when using several other machine learning methods—k nearest neighbor, logistic regression, support vector machine, and random forest—to improve ensemble docking. We found significant improvement.
Keywords
  • ensemble docking,
  • k nearest neighbor,
  • logistic regression,
  • machine‐learning,
  • protein kinases,
  • random forest,
  • support vector machine
Publication Date
May 13, 2020
DOI
10.1002/PROT.25899
Publisher Statement
Ensemble docking has provided an inexpensive method to account for receptor flexibility in molecular docking for virtual screening. Unfortunately, as there is no rigorous theory to connect the docking scores from multiple structures to measured activity, researchers have not yet come up with effective ways to use these scores to classify compounds into actives and inactives. This shortcoming has led to the decrease, rather than an increase in the performance of classifying compounds when more structures are added to the ensemble. Previously, we suggested machine learning, implemented in the form of a naïve Bayesian model could alleviate this problem. However, the naïve Bayesian model assumed that the probabilities of observing the docking scores to different structures to be independent. This approximation might prevent it from achieving even higher performance. In the work presented in this paper, we have relaxed this approximation when using several other machine learning methods—k nearest neighbor, logistic regression, support vector machine, and random forest—to improve ensemble docking. We found significant improvement.
Citation Information
Tanay Chandak, John P. Mayginnes, Howard Mayes and Chung F. Wong. "Using machine learning to improve ensemble docking for drug discovery." Proteins (2020)
Available at: http://works.bepress.com/chung-wong/80/