PCA: Multidrug

Case study of PCA with Multidrug data set

Principal Component Analysis (PCA) is primarily used to explore one single type of ‘omics data (e.g. transcriptomics, proteomics, metabolomics, etc) and identify the largest sources of variation. We often use PCA as a preliminary step to better understand the data.

To begin…

Load the latest version of mixOmics.



Multidrug data set contains the expression of 48 known human ABC transporters with patterns of drug activity in 60 diverse cancer cell lines (the NCI-60) used by the National Cancer Institute to screen for anticancer activity. The data come from a pharmacogenomic study [1].

The multidrug data set is implemented in mixOmics via multidrug, and contains the following:

multidrug$ABC.trans data matrix with 60 rows and 48 columns. The expression of the 48 human ABC transporters for the 60 cell lines.

multidrug$compound data matrix with 60 rows and 1429 columns. The activity of 1429 drugs for the 60 cell lines.

multidrug$comp.name character vector. The names or the NSC No. of the 1429 compounds.

multidrug$cell.line a list containing two character vector components: Sample the names of the 60 cell line which were analysed, and Class the phenotypes of the 60 cell lines. The NCI-60 panel includes cell lines derived from cancers of colorectal (7 cell lines), renal (8), ovarian (6), breast (8), prostate (2), lung (9) and central nervous system origin (6), as well as leukemias (6) and melanomas (8).

We begin my examining the ABC transporter data multidrug$ABC.trans:

X <- multidrug$ABC.trans 
dim(X) # check dimension of data
## [1] 60 48

Preliminary analysis with PCA

Start a preliminary investigation with PCA analysis on the expression data of transporter genes. PCA is an unsupervised approach (eg. no information about the cell class is input in PCA), but coloring the samples according to their cell classes can help the interpretation.

trans.pca <- pca(X, ncomp = 10, center = TRUE, scale = TRUE)
## Eigenvalues for the first 10 principal components, see object$sdev^2: 
##      PC1      PC2      PC3      PC4      PC5      PC6      PC7      PC8 
## 6.083071 4.891838 3.364484 3.002342 2.762492 2.436075 2.341665 2.092605 
##      PC9     PC10 
## 1.851851 1.701218 
## Proportion of explained variance for the first 10 principal components, see object$explained_variance: 
##        PC1        PC2        PC3        PC4        PC5        PC6 
## 0.12676460 0.10194060 0.07011220 0.06256555 0.05756734 0.05076515 
##        PC7        PC8        PC9       PC10 
## 0.04879775 0.04360763 0.03859056 0.03545153 
## Cumulative proportion explained variance for the first 10 principal components, see object$cum.var: 
##       PC1       PC2       PC3       PC4       PC5       PC6       PC7 
## 0.1267646 0.2287052 0.2988174 0.3613829 0.4189503 0.4697154 0.5185132 
##       PC8       PC9      PC10 
## 0.5621208 0.6007114 0.6361629 
##  Other available components: 
##  -------------------- 
##  loading vectors: see object$rotation

plot of chunk unnamed-chunk-3

trans.pca2 <- pca(X, ncomp = 48, center = TRUE, scale = TRUE) 
# some warnings may appear as we are asking for many comp and the algo may not converge

plot of chunk unnamed-chunk-4

Sample Plots

plotIndiv(trans.pca, comp = c(1, 2), ind.names = TRUE, 
          group = multidrug$cell.line$Class, 
          legend = TRUE, title = 'Multidrug transporter, PCA comp 1 - 2')

plot of chunk unnamed-chunk-5

From the PCA sample plots, we can observe some separation between the different cell lines. The sample plot on the first 2 principal components shows an interesting separation of the Melanoma cell lines along the first component.

Variable Plots

Here, a correlation circle plot highlights clusters of ABC transporters and show their contribution to each principal component (variables close to the circle of radius 1). See here for details on interpreting correlation circle plots.

plotVar(trans.pca, comp = c(1, 2), var.names = TRUE, 
        title = 'Multidrug transporter, PCA comp 1 - 2')

plot of chunk unnamed-chunk-6

Biplots allow to both samples and variables to be graphically displayed simultaneously. See here for details on interpreting biplots.

biplot(trans.pca, cex = 0.7,
       xlabs = paste(multidrug$cell.line$Class, 1:nrow(X)))

plot of chunk unnamed-chunk-7

In the biplot above observe that the Melanoma samples seem to be characterized by a small subset of highly positively correlated ABC transporters.


  1. Szakács G., Annereau J.-P., Lababidi S., Shankavaram U., Arciello A., Bussey K.J., Reinhold W., Guo Y., Kruh G.D., Reimers M., Weinstein J.N. and Gottesman M.M. (2004) Predicting drug sensitivity and resistance: Profiling ABC transporter genes in cancer cells. Cancer Cell 4, pp 147-166.

  2. Jolliffe, I. (2005). Principal component analysis. Wiley Online Library.