Case study Independent PCA analysis (IPCA) using Liver Toxicity data set
Here, we illustrate IPCA using the liver toxcicity data set, see ?liver.toxicity. The data set contains the expression measure of 3116 genes and 10 clinical measurements for 64 subjects (rats) that were exposed to non-toxic, moderately toxic or severely toxic doses of acetaminophen in a controlled experiment, see study details in .
Load the latest version of mixOmics.
The liver toxicity data set is implemented in mixOmics via liver.toxicity, and contains the following:
$gene: data frame with 64 rows and 3116 columns. The expression measure of 3116 genes for the 64 subjects (rats).
$clinic: data frame with 64 rows and 10 columns, containing 10 clinical variables for the same 64 subjects.
$treatment: data frame with 64 rows and 4 columns, containing information on the treatment of the 64 subjects, such as doses of acetaminophen and times of necropsy.
data(liver.toxicity) X <- liver.toxicity$gene
Preliminary analysis with PCA
For comparison, an ordinary PCA is included here to highlight the benefits of using IPCA.
liver.pca<- pca(X, ncomp = 3, scale = FALSE)
plotIndiv(liver.pca, ind.names = liver.toxicity$treatment[, 3], group= liver.toxicity$treatment[,4], legend = TRUE, title = 'Liver, PCA')
IPCA combines the advantages of both PCA and Independent Component Analysis (ICA) to reveal insightful patterns in the data, see Yao et al., (2012) . IPCA results in better clustering of biological samples on graphical representations. A sparse version is also implemented to perform variable selection. Mode = 'deflation' is the proposed default algorithm to use when estimating the unmixing matrix in IPCA, see .
liver.ipca <- ipca(X, ncomp = 3, mode="deflation", scale = FALSE)
plotIndiv(liver.ipca, ind.names = liver.toxicity$treatment[, 3], group= liver.toxicity$treatment[,4], legend = TRUE, title = 'Liver, IPCA')
As the above sample plots demonstrate, IPCA offers a better visualization of the data with a smaller number of components than PCA. IPCA more appropriately seperates dose grouping, specifically low dose groups, along the first and second component compared to PCA. IPCA works well if the data have a super-Gaussian distribution. Indeed PCA assumes that gene expression data have Gaussian signals, while it has been demonstrated that many gene expression data in fact have 'super-Gaussian' signals, see .
Note: PCA is an unsupervised approach (eg. no information about the group is input in PCA), but coloring the samples according to their group can help the interpretation.
Note that both IPCA and PCA rank the variables in similar same order of importance, the largest difference lies in the loading values
head(selectVar(liver.ipca, comp = 1)$value)
## value.var ## A_42_P567268 -0.002574236 ## A_43_P11710 0.002430036 ## A_42_P584188 0.002260977 ## A_42_P493162 -0.002199279 ## A_42_P496622 0.002078646 ## A_43_P15711 -0.002050732
head(selectVar(liver.pca, comp = 1)$value)
## value.var ## A_42_P567268 0.13356427 ## A_42_P493162 0.12585209 ## A_43_P15711 0.11553498 ## A_43_P11754 0.10697546 ## A_43_P14324 0.10053332 ## A_42_P496622 -0.09992275
plotVar(liver.ipca, pch = 20)
The use of a sparse IPCA (sIPCA) would be more appropriate to interpret the results as there are too many genes, see the next section.
The kurtosis measure is used to order the loading vectors to order the Independent Principal Components. The kurtosis value is a good post hoc indicator of the number of components to choose, as a sudden drop in the values corresponds to irrelevant dimensions.
##  9.7068221 6.9869933 0.6729702
Sparse Independent PCA analysis (sIPCA)
Sparse Independent Principal Component Analysis (IPCA) combines the advantages of IPCA with soft-thresholding applied in the independent loading vectors to perform internal variable selection, see mixOmics page here for more details.
liver.sipca <- sipca(X, ncomp = 3, mode = "deflation", scale = FALSE, keepX = c(50,50,50))
plotIndiv(liver.sipca, ind.names = liver.toxicity$treatment[, 3], group= liver.toxicity$treatment[,4], legend = TRUE, title = 'Liver, IPCA')
head(selectVar(liver.sipca, comp = 1)$value)
## value.var ## A_42_P567268 -0.001537541 ## A_43_P11710 0.001393341 ## A_42_P584188 0.001224282 ## A_42_P493162 -0.001162584 ## A_42_P496622 0.001041951 ## A_43_P15711 -0.001014037
plotVar(liver.sipca, pch = 20)
As the above plots show, there are improvements observed in the sample cluster in sIPCA compared to IPCA.
Bushel, P.R., Wolfinger, R.D. and Gibson, G., 2007. Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes. BMC Systems Biology, 1(1), p.1.