Case study of PCA with Multidrug data set
Principal Component Analysis (PCA) is primarily used to explore one single type of ‘omics data (e.g. transcriptomics, proteomics, metabolomics, etc) and identify the largest sources of variation. We often use PCA as a preliminary step to better understand the data.
To begin…
Load the latest version of mixOmics.
library(mixOmics)
Data
Multidrug data set contains the expression of 48 known human ABC transporters with patterns of drug activity in 60 diverse cancer cell lines (the NCI-60) used by the National Cancer Institute to screen for anticancer activity. The data come from a pharmacogenomic study [1].
The multidrug data set is implemented in mixOmics via multidrug, and contains the following:
multidrug$ABC.trans data matrix with 60 rows and 48 columns. The expression of the 48 human ABC transporters for the 60 cell lines.
multidrug$compound data matrix with 60 rows and 1429 columns. The activity of 1429 drugs for the 60 cell lines.
multidrug$comp.name character vector. The names or the NSC No. of the 1429 compounds.
multidrug$cell.line a list containing two character vector components: Sample the names of the 60 cell line which were analysed, and Class the phenotypes of the 60 cell lines. The NCI-60 panel includes cell lines derived from cancers of colorectal (7 cell lines), renal (8), ovarian (6), breast (8), prostate (2), lung (9) and central nervous system origin (6), as well as leukemias (6) and melanomas (8).
We begin my examining the ABC transporter data multidrug$ABC.trans:
data(multidrug)
X <- multidrug$ABC.trans
dim(X) # check dimension of data
## [1] 60 48
Preliminary analysis with PCA
Start a preliminary investigation with PCA analysis on the expression data of transporter genes. PCA is an unsupervised approach (eg. no information about the cell class is input in PCA), but coloring the samples according to their cell classes can help the interpretation.
trans.pca <- pca(X, ncomp = 10, center = TRUE, scale = TRUE)
trans.pca
## Eigenvalues for the first 10 principal components, see object$sdev^2: ## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 ## 6.083071 4.891838 3.364484 3.002342 2.762492 2.436075 2.341665 2.092605 ## PC9 PC10 ## 1.851851 1.701218 ## ## Proportion of explained variance for the first 10 principal components, see object$explained_variance: ## PC1 PC2 PC3 PC4 PC5 PC6 ## 0.12676460 0.10194060 0.07011220 0.06256555 0.05756734 0.05076515 ## PC7 PC8 PC9 PC10 ## 0.04879775 0.04360763 0.03859056 0.03545153 ## ## Cumulative proportion explained variance for the first 10 principal components, see object$cum.var: ## PC1 PC2 PC3 PC4 PC5 PC6 PC7 ## 0.1267646 0.2287052 0.2988174 0.3613829 0.4189503 0.4697154 0.5185132 ## PC8 PC9 PC10 ## 0.5621208 0.6007114 0.6361629 ## ## Other available components: ## -------------------- ## loading vectors: see object$rotation
plot(trans.pca)
trans.pca2 <- pca(X, ncomp = 48, center = TRUE, scale = TRUE)
# some warnings may appear as we are asking for many comp and the algo may not converge
plot(trans.pca2)
Sample Plots
plotIndiv(trans.pca, comp = c(1, 2), ind.names = TRUE,
group = multidrug$cell.line$Class,
legend = TRUE, title = 'Multidrug transporter, PCA comp 1 - 2')
From the PCA sample plots, we can observe some separation between the different cell lines. The sample plot on the first 2 principal components shows an interesting separation of the Melanoma cell lines along the first component.
Variable Plots
Here, a correlation circle plot highlights clusters of ABC transporters and show their contribution to each principal component (variables close to the circle of radius 1). See here for details on interpreting correlation circle plots.
plotVar(trans.pca, comp = c(1, 2), var.names = TRUE,
title = 'Multidrug transporter, PCA comp 1 - 2')
Biplots allow to both samples and variables to be graphically displayed simultaneously. See here for details on interpreting biplots.
biplot(trans.pca, cex = 0.7,
xlabs = paste(multidrug$cell.line$Class, 1:nrow(X)))
In the biplot above observe that the Melanoma samples seem to be characterized by a small subset of highly positively correlated ABC transporters.
References
-
Jolliffe, I. (2005). Principal component analysis. Wiley Online Library.