Skip to main content
Article
Predicting Discontinuation of Docetaxel Treatment for Metastatic Castration-Resistant Prostate Cancer (mCRPC) With Hill-Climbing and Random Forest
F1000Research
  • Daniel Kristiyanto
  • Kevin E. Anderson
  • Seyed Sina Khankhajeh
  • Kaiyuan Shi
  • Seth West
  • Ling Hong Hung
  • Azu Lee
  • Qi Wei
  • Migao Wu
  • Yunhong Yin
  • Ka Yee Yeung, University of Washington Tacoma
Publication Date
11-30-2015
Document Type
Article
Abstract

Motivation In the DREAM 9.5 Prostate Cancer subchallenge 2, we developed predictive models to predict patient outcomes in metastatic castrate-resistant prostate cancer (mCRPC) with subsequent discontinuation of docetaxel therapy. The input data consist of 131 variables measured across data from three clinical trials, namely, Memorial Sloan Kettering (MSK, 476 patients), Celgene (526 patients), Sanofi (598 patients). The goal is to predict which patients in a fourth clinical trial, AstraZeneca (AZ, 470 patients), would discontinue treatment due to adverse events within 3 months. Data & Methods The data cleansing were done separately within each clinical trial and later merged back together. Our data cleansing and pre-processing procedures include imputation of missing data, and removal of clinical variables with a high percentage of missing data. Data augmentation were also performed by converting selected multi-label variables into binary variables. We observed that univariate feature selection methods did not perform well. Hence, we adopted a hill-climbing approach that optimized the AUC within 10-fold cross validation of the training data. We also addressed the issue of imbalanced data (1292 negative and 197 positive samples) by randomly removing negative samples to meet a ratio roughly of 60% negative and 40% positive samples. We applied random forest using Sanofi as the hold-out, setting the parameters “mtry” to 25% of the number of features and number of trees to 100 times of the number of features. Our predictive model using MSK and Celgene data as the training set and Sanofi data as the test set yielded AUC=0.165, accuracy=0.9, precision=0.21, F1=0.092 and recall=0.06. Results Our final submission in predicting the discontinuation of docetaxel in the AstraZeneca clinical trial (using MSK, Celgene and Sanofi as training data) resulted in AUC of 0.13. Across the 470 patients in AstraZeneca clinical trial, 8 patients are predicted to discontinue the treatment within 3 months. Acknowledgement Hung and Yeung are supported by NIH grant U54-HL127624. This project used computing resources provided by Microsoft Azure. We thank all students in TCSS 588 Bioinformatics in Spring 2015 at University of Washington Tacoma who contributed to this project.

DOI
10.7490/f1000research.1111091.1
Publisher Policy
open access
Citation Information
Daniel Kristiyanto, Kevin E. Anderson, Seyed Sina Khankhajeh, Kaiyuan Shi, et al.. "Predicting Discontinuation of Docetaxel Treatment for Metastatic Castration-Resistant Prostate Cancer (mCRPC) With Hill-Climbing and Random Forest" F1000Research Vol. 4 (2015)
Available at: http://works.bepress.com/ky-yeung/15/