Predictive modeling with Bioclipse-R 

Building QSAR models for virtual screening and mutagenicity predictions with Bioclipse-R

This is a tutorial to use Bioclipse and the Bioclipse-R function for building QSAR models, which are used to virtually screen existing drug data. The tutorial accompanies the use case outlined in a paper by Spjuth et al. (In preparation). For general instructions on Bioclipse-R, please see the page on Bioclipse-R. For instructions on scripting Bioclipse, see the page Bioclipse Scripting Language.

There are in most cases two alternative ways to perform the analysis, one using the Graphical User Interface (GUI) of Bioclipse, and the other using the Bioclipse Scripting Language.

We will in the demonstration use CDK descriptors for the screening model and signature descriptors for the mutagenicity model. Further, we will also use random forest for model fitting in the screening and SVM for mutagenicity modeling. The only reason for these differences between the screening and the mutagenicity models is to demonstrate the versatility of Bioclipse-R. We could in principle equally well only have used CDK descriptors or signature descriptors, and analogously used only random forest or SVM (or indeed some other modeling method).

Tutorial parts

Part 1: Cancer cell line growth inhibition modeling This part covers the development of a QSAR model for predicting whether a new compound has anticancer activity or not.

Part 2: Mutagenicity modeling This part covers the development of a QSAR model for predicting mutagenicity based on a traing set with AMES data.

Part 3: Virtual screening and mutagenicity prediciton This part utilizes the predictive models developed in (A) and (B) to screen Drugbank for compounds with cancer activity and which are not predicted as mutagenic.


  1. Download and unpack a pre-built package of Bioclipse-R.
  2. Install a recent version of R (at least 2.13). Download it from
  3. Mac OS users need to install Tcl/Tk 8.5.5 for X11. Download it from
  4. The R code in this demonstration uses the following R-packages: randomForest, e1071, pROC, SparseM. These packages therefore need to be installed before beginning the demonstration (if you do not already have them installed). This can be done by running the following command in R:

    install.packages(c("randomForest", "e1071", "pROC", "SparseM"))
  5. For this tutorial, a set of R functions has been developed and need to be loaded into R. Download the file functionsForVirtualScreeningApplication.R, import it in Bioclipse, and load them with:

Citing Bioclipse-R

Please cite the following article if you use this work in your research:

Ola Spjuth, Valentin Georgiev, Lars Carlsson, Jonathan Alvarsson, Arvid Berg, Jarl E.S. Wikberg, and Martin Eklund
Bioclipse-R: Integrating management and visualization of life science data with statistical analysis.
In preparation.


Tutorial created by Ola Spjuth and Martin Eklund
Department of Pharmaceutical Biosciences, Uppsala University, Sweden