# Case study Independent PCA analysis (IPCA) using Liver Toxicity data set

Here, we illustrate IPCA using the liver toxcicity data set, see ?liver.toxicity. The data set contains the expression measure of 3116 genes and 10 clinical measurements for 64 subjects (rats) that were exposed to non-toxic, moderately toxic or severely toxic doses of acetaminophen in a controlled experiment, see study details in [2].

## To begin…

library(mixOmics)


# Data

The liver toxicity data set is implemented in mixOmics via liver.toxicity, and contains the following:

$gene: data frame with 64 rows and 3116 columns. The expression measure of 3116 genes for the 64 subjects (rats).$clinic: data frame with 64 rows and 10 columns, containing 10 clinical variables for the same 64 subjects.

$treatment: data frame with 64 rows and 4 columns, containing information on the treatment of the 64 subjects, such as doses of acetaminophen and times of necropsy. data(liver.toxicity) X <- liver.toxicity$gene


# Preliminary analysis with PCA

For comparison, an ordinary PCA is included here to highlight the benefits of using IPCA.

liver.pca<- pca(X, ncomp = 3, scale = FALSE)

plotIndiv(liver.pca, ind.names = liver.toxicity$treatment[, 3], group= liver.toxicity$treatment[,4], legend = TRUE, title = 'Liver, PCA')


# IPCA

IPCA combines the advantages of both PCA and Independent Component Analysis (ICA) to reveal insightful patterns in the data, see Yao et al., (2012) [1]. IPCA results in better clustering of biological samples on graphical representations. A sparse version is also implemented to perform variable selection. Mode = 'deflation' is the proposed default algorithm to use when estimating the unmixing matrix in IPCA, see [1].

liver.ipca <- ipca(X, ncomp = 3, mode="deflation", scale = FALSE)


## Sample Plots

plotIndiv(liver.ipca, ind.names = liver.toxicity$treatment[, 3], group= liver.toxicity$treatment[,4], legend = TRUE, title = 'Liver, IPCA')


As the above sample plots demonstrate, IPCA offers a better visualization of the data with a smaller number of components than PCA. IPCA more appropriately seperates dose grouping, specifically low dose groups, along the first and second component compared to PCA. IPCA works well if the data have a super-Gaussian distribution. Indeed PCA assumes that gene expression data have Gaussian signals, while it has been demonstrated that many gene expression data in fact have 'super-Gaussian' signals, see [1].

Note: PCA is an unsupervised approach (eg. no information about the group is input in PCA), but coloring the samples according to their group can help the interpretation.

## Variable Plots

Note that both IPCA and PCA rank the variables in similar same order of importance, the largest difference lies in the loading values

head(selectVar(liver.ipca, comp = 1)$value)  ## value.var ## A_42_P567268 -0.002574236 ## A_43_P11710 0.002430036 ## A_42_P584188 0.002260977 ## A_42_P493162 -0.002199279 ## A_42_P496622 0.002078646 ## A_43_P15711 -0.002050732  head(selectVar(liver.pca, comp = 1)$value)

##                value.var
## A_42_P567268  0.13356427
## A_42_P493162  0.12585209
## A_43_P15711   0.11553498
## A_43_P11754   0.10697546
## A_43_P14324   0.10053332
## A_42_P496622 -0.09992275

plotVar(liver.ipca, pch = 20)


The use of a sparse IPCA (sIPCA) would be more appropriate to interpret the results as there are too many genes, see the next section.

## Kurtosis

The kurtosis measure is used to order the loading vectors to order the Independent Principal Components. The kurtosis value is a good post hoc indicator of the number of components to choose, as a sudden drop in the values corresponds to irrelevant dimensions.

liver.ipca$kurtosis  ## [1] 9.7068221 6.9869933 0.6729702  # Sparse Independent PCA analysis (sIPCA) Sparse Independent Principal Component Analysis (IPCA) combines the advantages of IPCA with soft-thresholding applied in the independent loading vectors to perform internal variable selection, see mixOmics page here for more details. liver.sipca <- sipca(X, ncomp = 3, mode = "deflation", scale = FALSE, keepX = c(50,50,50))  ## Sample Plots plotIndiv(liver.sipca, ind.names = liver.toxicity$treatment[, 3], group= liver.toxicity$treatment[,4], legend = TRUE, title = 'Liver, IPCA')  ## Variable Plots head(selectVar(liver.sipca, comp = 1)$value)

##                 value.var
## A_42_P567268 -0.001537541
## A_43_P11710   0.001393341
## A_42_P584188  0.001224282
## A_42_P493162 -0.001162584
## A_42_P496622  0.001041951
## A_43_P15711  -0.001014037

plotVar(liver.sipca, pch = 20)


As the above plots show, there are improvements observed in the sample cluster in sIPCA compared to IPCA.