|Predictive modeling with Bioclipse-R|
This is a tutorial to use Bioclipse and the Bioclipse-R function for building QSAR models, which are used to virtually screen existing drug data. The tutorial accompanies the use case outlined in a paper by Spjuth et al. (In preparation). For general instructions on Bioclipse-R, please see the page on Bioclipse-R. For instructions on scripting Bioclipse, see the page Bioclipse Scripting Language.
There are in most cases two alternative ways to perform the analysis, one using the Graphical User Interface (GUI) of Bioclipse, and the other using the Bioclipse Scripting Language.
We will in the demonstration use CDK descriptors for the screening model and signature descriptors for the mutagenicity model. Further, we will also use random forest for model fitting in the screening and SVM for mutagenicity modeling. The only reason for these differences between the screening and the mutagenicity models is to demonstrate the versatility of Bioclipse-R. We could in principle equally well only have used CDK descriptors or signature descriptors, and analogously used only random forest or SVM (or indeed some other modeling method).
Tutorial partsPart 1: Cancer cell line growth inhibition modeling This part covers the development of a QSAR model for predicting whether a new compound has anticancer activity or not.
Part 2: Mutagenicity modeling This part covers the development of a QSAR model for predicting mutagenicity based on a traing set with AMES data.
Part 3: Virtual screening and mutagenicity prediciton This part utilizes the predictive models developed in (A) and (B) to screen Drugbank for compounds with cancer activity and which are not predicted as mutagenic.
install.packages(c("randomForest", "e1071", "pROC", "SparseM"))
Ola Spjuth, Valentin Georgiev, Lars Carlsson, Jonathan Alvarsson, Arvid Berg, Jarl E.S. Wikberg, and Martin Eklund
Bioclipse-R: Integrating management and visualization of life science data with statistical analysis.