mixOmics is collaborative project developed by the mixOmics team (Kim-Anh Lê Cao, Florian Rohart, Ignacio González and Sébastien Déjean), key contributors (Benoît Gautier, François Bartolo) and several key collaborators. The project started at the Institut de Mathématiques de Toulouse, Université Paul Sabatier, Toulouse, France and was then further extended in Australia, at the University of Queensland, Brisbane (2009 – 2016) and at the University of Melbourne, Australia (2017 – ).
Why multivariate methods?
It is now generally admitted that single ‘omics analysis does not provide enough information to give a deep understanding of a biological system, but we can obtain a more holistic view of a system by combining multiple ‘omics analyses. Our mixOmics R package proposes a whole range of multivariate methods that we developed and validated on many biological studies to gain more insight into ‘omics biological studies.
mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection
Multivariate methods are well suited to large ‘omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (‘components’), which are defined as combination of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structure between the different data sets that are integrated. We have developed several sparse multivariate models to identify the key variables that are highly correlated, and/or explain the biological outcome of interest (e.g. disease status). The identified variables are then more amenable to statistical inference and to posit novel biological hypotheses to be further validated in the laboratory.
Which type of data?
The data analysed with mixOmics may come from high throughput sequencing technologies, such as ‘omics data (transcriptomics, metabolomics, proteomics, metagenomics …) but also beyond the realm of ‘omics (e.g. spectral imaging). We are currently developing new methods to integrative genotype data and time-course or longitudinal data.
The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data.
About this website
This website gives a full tutorial introduction to the main mixOmics features and illustrate full multivariates analyses on some case studies. Click on the different tabs to see all options available.
Any questions or feedback? Contact us here.
mixOmics is under active development as we focus on the development of novel multivariate methods to address pressing needs for omics data integration. Register to our mailing list to make sure you are on top of the game with our latest version, or have a look at the NEWS posts .
We also run regular 2 and 3-day workshops in Australia and in Europe. Have a look at our list of upcoming workshops and do not hesitate to contact us to run dedicated workshops in your area of expertise and country.
The mixOmics framework today
The toolkit includes 17 multivariate methodologies today, depicted below depending on the data to integrate and the biological questions (e.g. exploration, discriminant analysis, data integration for 2 or more data sets).
The R package and key references
The mixOmics R package is organised into three main parts:
- Statistical methodologies to analyze high throughput data
- (s)PCA: (sparse) Principal Component Analysis as proposed by Shen and Huang 2008.
- (s)IPCA: independent Principal Component Analysis
- (r)CCA: (regularized) Canonical Correlation Analysis as implemented in Gonzales et al 2008.
- (s)PLS: (sparse) Partial Least Squares (regression or canonical deflations)
- (s)PLS-DA: (sparse) Partial Least Squares Discriminant Analysis
- Multilevel decomposition for repeated measurements
- NEW mixMC for 16S multivariate analysis (see article)
- NEW mixMINT for vertical multiple integration (see preprint here, in press)
- NEW mixDIABLO for horizontal multiple integration, based on this article, but with substantial improvements, see preprint here.
- NEW the integrative and supervised methods in mixOmics are summarised and presented in our preprint.
- Graphical outputs to display the results and improve interpretation
- Example data sets
- breast.tumor (gene expression data, with missing data)
- linnerud: very small data set
- liver.toxicity (gene expression and clinical data)
- multidrug (ABC transporters and compounds)
- nutrimouse (gene expression and fatty acids data)
- srbct (gene expression data)
- yeast (metabolites data)
- vac18 and vac18.simulated for multilevel analyses
- NEW diverse.16S and Koren.16S for mixMC 16S analyses (similar to that paper)
- NEW breast.TCGA for DIABLO horizontal multiple integration analyses
- NEW stemcells for MINT vertical multiple integration analyses