library(mixOmics)
data(liver.toxicity) # Call data set in the package
X <- liver.toxicity$gene # Load gene expression data into a matrix
results <- ipca(X) # run the method
plotIndiv(results) # plot the samples
plotVar(results) # plot the variables
?ipca
can be run to determine the default arguments of
this function:
ncomp = 2
): Only the first two
Principal Components are calculated .scale = FALSE
): The data is not scaled. If set
to TRUE
, all variables will be standardised to have unit
variance.results <- sipca(X) # run the method
plotIndiv(results) # plot the samples
plotVar(results) # plot the variables
# extract the variables used to construct the first IPC
selectVar(results, comp = 1)$name
# depict weight assigned to each of these variables
plotLoadings(results, method = 'mean', contrib = 'max')
?sipca
can be run to determine the default arguments of
this function:
ncomp = 3
): Only the first three
Principal Components are calculated .keepX = [50, 50, 50]
): This parameter defaults to a list,
of ncomp
length, containing 50 variables for each
component. The best 50 variables will be used to construct each
component.In some case studies, Principal Component Analysis (PCA) has limitations, such as:
PCA assumes that gene expression follows a multivariate normal distribution. Recent studies suggest that not all omics data can be assumed to follow a Gaussian distribution. For instance, microarray gene expression seems to follow a super-Gaussian distribution.
PCA decomposes the data based on the maximization of its variance. In some cases, the biological question may not be related to the highest variance in the data.
Independent Component Analysis (ICA) is a process where novel components are extracted from the data, not to maximise explained variance, but to denoise and reduce the impacts of artefacts [2, 3]. Components produced by ICA contain no overlapping information.
ICA and PCA can be combined into Independent Principal Component Analysis (IPCA) [1]. IPCA combines the strengths of each component analysis method.
sIPCA applies the same framework that sPCA applies to PCA, but onto IPCA. This allows for Principal Components to be formed using a subset of optimally selected variables during the PCA portion of the IPCA algorithm.
IPCA is a non-parametric form of dimension reduction. It combines the goals of ICA and PCA such that it searches for components that capture the most variance across features which have had their noise reduced. The components yielded by IPCA (Independant Principle Components - IPCs) are non-overlapping and non-Gaussian. This is extremely useful in contexts where distributions such as the Super-Gaussian are expected (eg. microbiome data). In most cases where it is appropriate to use, IPCA outperforms both ICA and PCA by summarising the data better or requiring less components to summarise the data to the same degree.
The algorithm of IPCA is as follows: