Skip to main content
Article
Fine-Tuning ChemBERTa-2 for Aqueous Solubility Prediction
Annals of Chemical Science Research (2023)
  • Andrew Lang, Oral Roberts University
  • Wei-Khiong Chong
  • Jan HR Worner, Oral Roberts University
Abstract
Traditional machine-learning techniques for predicting physical-chemical properties often require the calculation and selection of molecular descriptors. Calculating descriptors can be time-consuming and computationally expensive, and there is no guarantee that all relevant and significant features will be captured, especially when trying to predict novel endpoints. In this study, we demonstrate the effectiveness of transformer models in predicting physical-chemical endpoints by fine-tuning the open ChemBERTa-2 model to predict aqueous solubility directly from structure with comparable accuracy to traditional machine-learning techniques, without the need for descriptor calculation and selection. Our findings suggest that transformer models have the potential to provide an efficient and streamlined method for predicting physical-chemical properties directly from molecular structure.
Keywords
  • Transformer models,
  • ChemBERTa-2,
  • SMILES,
  • Cheminformatics,
  • Physical-chemical property prediction
Publication Date
May 19, 2023
DOI
10.31031/ACSR.2023.04.000578
Citation Information
Andrew Lang, Wei-Khiong Chong and Jan HR Worner. "Fine-Tuning ChemBERTa-2 for Aqueous Solubility Prediction" Annals of Chemical Science Research Vol. 4 Iss. 1 (2023) ISSN: 2688-8394
Available at: http://works.bepress.com/andrew-sid-lang/41/
Creative Commons license
Creative Commons License
This work is licensed under a Creative Commons CC_BY International License.