Changes in 4.1 ================ New features: ------------- - New S3 method valid for objects of class psl, spls, plsda and splsda - New select.var function to directly extract the selected variables from spls, spca, sipca - New data set vac18 for multilevel data
Category: News
Article published explaining correlation circle plots, relevance networks and CIM
Our manuscript ‘Insightful graphicalt outputs to explore relationships between two “omics” data sets has been published and explains how to interpret Correlation Circle plots, how relevance networks and CIM are generated from rCCA and sPLS.
Check this very colourful manuscript[intlink id=”202″ type=”page”]here[/intlink]!
Another presentation about mixOmics
Another general presentation of mixOmics dating Dec 2012, which presents some preliminary but exciting results about time course data and the generalisation of PLS to multi block data sets using the approach of our collaborator Arthur Tenenhaus and colleagues.
Go[intlink id=”202″ type=”page”]here[/intlink].
General presentation about mixOmics
A new general presentation about mixOmics is available (and should be updated for major update of the package) in the [intlink id=”204″ type=”page”]Presentation Section[/intlink].
Lê Cao K.-A. Unravelling `omics’ data with the mixOmics R package, Illustration on several studies. General presentation on mixOmics (last updated 05/04/2012) [Presentation]
(s)IPCA
Independent Principal Component Analysis (IPCA)
In some case studies, we have identified some limitations when using PCA:
- PCA assumes that gene expression follows a multivariate normal distribution and recent studies have demonstrated that microarray gene expression measurements follow instead a super-Gaussian distribution
- PCA decomposes the data based on the maximization of its variance. In some cases, the biological question may not be related to the highest variance in the data
Instead, we propose to apply Independent Principal Component Analysis (IPCA) which combines the advantages of both PCA and Independent Component Analysis (ICA). It uses ICA as a denoising process of the loading vectors produced by PCA to better highlight the important biological entities and reveal insightful patterns in the data.
IPCA offers a better visualization of the data than ICA and with a smaller number of components than PCA.
How to choose the number of components:
The kurtosis measure is used to order the loading vectors to order the Independent Principal Components. We have shown that the kurtosis value is a good post hoc indicator of the number of components to choose, as a sudden drop in the values corresponds to irrelevant dimensions.
Sparse Independent Principal Component Analysis (sIPCA)
Similar to the [intlink id=”129″ type=”page”]sparse PCA[/intlink] version implemented in mixOmics, soft-thresholding is applied in the independent loading vectors in IPCA to perform internal variable selection.
How to choose the number of variables to select:
The number of variables to select is still an open issue. In our paper we proposed to use the Davies Bouldinmeasure which is an index of crisp cluster validity. This index compares the within-cluster scatter with the between-cluster separation.
More details about how to use the ipca.R function in the[intlink id=”233″ type=”page”] case study[/intlink].
References
- Yao F., Coquery J., Lê Cao K.-A. (2012) Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets, BMC Bioinformatics 13:24.
- Comon P: Independent component analysis, a new concept? Signal Process 1994, 36:287-314.
- Hyvärinen A, Oja E: Indepedent Component Analysis: Algorithms and Applications. Neural Networks 2000, 13(4-5):411-430
New methods: multilevel analyses
A multilevel approach has been added for cross-over design experiments (up to two cross factors), in collaboration with A/Prof B. Liquet (Universite de Bordeaux, France). This approach takes into account the complex structure of repeated measurements from different assays, where different treatments are applied on the same subjects to highlight the treatment effects within subject separately from the biological variation between subject.
Two different frameworks are proposed:
- a discriminant analysis (method = ‘splsda’) enables the selection of features separating the different treatments
- a integrative analysis (method = ‘spls’) enables the interaction of two matched data sets and the selection of subset of correlated variables (positively or negatively) across the samples. The approach is unsupervised: no prior knowledge about the samples groups is included.
The multilevel function first decomposes the variance in the data sets X (and Y) and applies either sPLS-DA or sPLS on the within-subject deviation. One or two-factor analyses are available for sPLS-DA.
Associated functions include: multilevel.R, tune.multilevel.R, pheatmap.multilevel.R (see examples in methods, graphics and case studies).
This is our first step towards repeated measurements designs.
The package has been updated to version 4.0-1 to implement these methodologies. It now requires the library ‘pheatmap’.
Web-interface
- R package and Methods: IPCA and sparse IPCA functions have been implemented (as well as their associated S3 functions). IPCA stands for Principal Component Analysis with Independent Loadings. It is a combination of the advantages of both PCA and Independent Component Analysis (ICA). PCA is a powerful exploratory tool if the biological question is related to the highest variance. ICA was recently proposed in the literature as an alternative to PCA as it optimizes an independence condition that can give more meaningful components. A preprint can be available upon request.
- R package and Data: The Liver Toxicity study data has been updated to provide geneBank IDs and gene titles
- R package and Data: Two other data sets have been added: Prostate Tumor study (gene expression) and Metabolomic study of Yeast (metabolomics).
- Web interface: We are making good progress on our associated web-interface (now deployed on http://mixomics.qfab.org). Few illustrative examples are also available, and you can download the illustrative examples and run any type of analysis trough the interface. We are currently developing a ‘next level analysis’ to provide pathway enrichment analyses and give the functional annotation of the selected genes using the iHOP database. Do not hesitate to give us some feedback!
- ‘sletter: we now have a newsletter, to subscribe, send an email to mixomics[at]math.univ-toulouse.fr with no subject in the body.
New Graphics: network & cim
- New S3 method
network
andcim
for results from PLS model - New code for the
valid
function to PLS-DA and SPLS-DA models validation - The S3 method
plot.valid
was modified to display graphical results fromvalid
function for PLS-DA and SPLS-DA models cim
andnetwork
functions were modified to obtain the similarity matrix in return value- The S3 method
plotVar
was modified to obtain the coordinates for X and Y variables in return value - The
predict
function has been modified to simultaneously run either several or all prediction methods available to predict the classes of the test data from PLS-DA and SPLS-DA models
New Function: (s)PCA added
- New function
pca
andspca
are now available to perform Principal Component Analysis (PCA) and sparse PCA for variable selection - The S3 methods
plotVar, plot3dVar, plotIndiv, plot3dIndiv
were modified to generate graphical results forpca
andspca
New function: plot.valid
- New function
plot.valid
to display the results of thevalid
function - New code for
imgCor
function for a nicer representation of the correlation matrices - In
predict
function the argument'method'
were replaced bymethod = c("max.dist", "class.dist", "centroids.dist", "mahalanobis.dist")
- The arguments
dendrogram
,ColSideColors
andRowSideColors
were added to thecim
function valid
function can also been performed with missing values- Functions
pls
,plsda
,spls
andsplsda
were modified to identify zero- or near-zero variance predictors - The functions
plotVar
andplot3dVar
were modified to represent only the X variables in the case of PLS-DA and SPLS-DA - The
pca
function has been improved so that the S3 methodsplotIndiv
,plot3dIndiv
,plotVar
andplot3dVar
can be used with these new classe