IPCA: Liver Toxicity

Case study Independent PCA analysis (IPCA) using Liver Toxicity data set

Here, we illustrate IPCA using the liver toxcicity data set, see ?liver.toxicity. The data set contains the expression measure of 3116 genes and 10 clinical measurements for 64 subjects (rats) that were exposed to non-toxic, moderately toxic or severely toxic doses of acetaminophen in a controlled experiment, see study details in [2].

To begin…

Load the latest version of mixOmics.

library(mixOmics)

Data

The liver toxicity data set is implemented in mixOmics via liver.toxicity, and contains the following:

$gene: data frame with 64 rows and 3116 columns. The expression measure of 3116 genes for the 64 subjects (rats).

$clinic: data frame with 64 rows and 10 columns, containing 10 clinical variables for the same 64 subjects.

$treatment: data frame with 64 rows and 4 columns, containing information on the treatment of the 64 subjects, such as doses of acetaminophen and times of necropsy.

data(liver.toxicity)
X <- liver.toxicity$gene

Preliminary analysis with PCA

For comparison, an ordinary PCA is included here to highlight the benefits of using IPCA.

liver.pca<- pca(X, ncomp = 3, scale = FALSE)
plotIndiv(liver.pca, ind.names = liver.toxicity$treatment[, 3], group= liver.toxicity$treatment[,4], legend = TRUE, title = 'Liver, PCA')

plot of chunk unnamed-chunk-5

IPCA

IPCA combines the advantages of both PCA and Independent Component Analysis (ICA) to reveal insightful patterns in the data, see Yao et al., (2012) [1]. IPCA results in better clustering of biological samples on graphical representations. A sparse version is also implemented to perform variable selection. Mode = 'deflation' is the proposed default algorithm to use when estimating the unmixing matrix in IPCA, see [1].

liver.ipca <- ipca(X, ncomp = 3, mode="deflation", scale = FALSE)

Sample Plots

plotIndiv(liver.ipca, ind.names = liver.toxicity$treatment[, 3], group= liver.toxicity$treatment[,4], legend = TRUE, title = 'Liver, IPCA')

plot of chunk unnamed-chunk-7

As the above sample plots demonstrate, IPCA offers a better visualization of the data with a smaller number of components than PCA. IPCA more appropriately seperates dose grouping, specifically low dose groups, along the first and second component compared to PCA. IPCA works well if the data have a super-Gaussian distribution. Indeed PCA assumes that gene expression data have Gaussian signals, while it has been demonstrated that many gene expression data in fact have 'super-Gaussian' signals, see [1].

Note: PCA is an unsupervised approach (eg. no information about the group is input in PCA), but coloring the samples according to their group can help the interpretation.

Variable Plots

Note that both IPCA and PCA rank the variables in similar same order of importance, the largest difference lies in the loading values

head(selectVar(liver.ipca, comp = 1)$value)
##                 value.var
## A_42_P567268 -0.002574236
## A_43_P11710   0.002430036
## A_42_P584188  0.002260977
## A_42_P493162 -0.002199279
## A_42_P496622  0.002078646
## A_43_P15711  -0.002050732
head(selectVar(liver.pca, comp = 1)$value)
##                value.var
## A_42_P567268  0.13356427
## A_42_P493162  0.12585209
## A_43_P15711   0.11553498
## A_43_P11754   0.10697546
## A_43_P14324   0.10053332
## A_42_P496622 -0.09992275
plotVar(liver.ipca, pch = 20)

plot of chunk unnamed-chunk-9

The use of a sparse IPCA (sIPCA) would be more appropriate to interpret the results as there are too many genes, see the next section.

Kurtosis

The kurtosis measure is used to order the loading vectors to order the Independent Principal Components. The kurtosis value is a good post hoc indicator of the number of components to choose, as a sudden drop in the values corresponds to irrelevant dimensions.

liver.ipca$kurtosis
## [1] 9.7068221 6.9869933 0.6729702

Sparse Independent PCA analysis (sIPCA)

Sparse Independent Principal Component Analysis (IPCA) combines the advantages of IPCA with soft-thresholding applied in the independent loading vectors to perform internal variable selection, see mixOmics page here for more details.

liver.sipca <- sipca(X, ncomp = 3, mode = "deflation",
                         scale = FALSE, keepX = c(50,50,50))

Sample Plots

plotIndiv(liver.sipca, ind.names = liver.toxicity$treatment[, 3], group= liver.toxicity$treatment[,4], legend = TRUE, title = 'Liver, IPCA')

plot of chunk unnamed-chunk-12

Variable Plots

head(selectVar(liver.sipca, comp = 1)$value)
##                 value.var
## A_42_P567268 -0.001537541
## A_43_P11710   0.001393341
## A_42_P584188  0.001224282
## A_42_P493162 -0.001162584
## A_42_P496622  0.001041951
## A_43_P15711  -0.001014037
plotVar(liver.sipca, pch = 20)

plot of chunk unnamed-chunk-13

As the above plots show, there are improvements observed in the sample cluster in sIPCA compared to IPCA.

References

  1. Yao F., Coquery J., LĂȘ Cao K.-A. (2012) Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets, BMC Bioinformatics 13:24.

  2. Bushel, P.R., Wolfinger, R.D. and Gibson, G., 2007. Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes. BMC Systems Biology, 1(1), p.1.