mixOmics is collaborative project developed by the mixOmics team and several key collaborators. The project started at the Institut de Mathématiques de Toulouse, Université Paul Sabatier, Toulouse, France and was then further extended at the University of Queensland, Brisbane, Australia.
Why multivariate methods?
It is now generally admitted tha single `omics analysis does not provide enough information to give a deep understanding of a biological system. We can obtain a more precise picture of a system by combining multiple omics analyses. In the mixOmics R package we propose a whole range of multivariate methods that we developed and validated on many biological studies.
mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection
Multivariate methods are well suited to large ‘omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (‘components’), which are defined as combination of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structure between the different data sets that are integrated. We have further developed sparse multivariate models to identify the key variables that are highly correlated, or explain the biological outcome of interest. The identified variables are then more amenable to statistical inference and the generation of novel biological hypotheses.
Which type of data?
The data we analyse may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics …) but also beyond the realm of ‘omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data.
About this website
This website gives a full tutorial introduction to the main mixOmics features and illustrate full multivariates analyses on some case studies. Click on the different tabs to see all options available.
Any questions or feedback? Contact us here.
mixOmics is under active development as we implement more methods. Register to our mailing list to make sure you are on top of the game with our latest version.
We also run regular 2 and 3-day workshops in Australia and in Europe. Have a look at our upcoming workshops and don’t hesitate to ask for more.
The mixOmics framework today
The mixOmics R package is organised into three main parts:
- Statistical methodologies to analyze high throughput data
- (s)PCA: (sparse) Principal Component Analysis as proposed by Shen and Huang 2008.
- (s)IPCA: independent Principal Component Analysis
- (r)CCA: (regularized) Canonical Correlation Analysis as implemented in Gonzales et al 2008.
- (s)PLS: (sparse) Partial Least Squares (regression or canonical deflations)
- (s)PLS-DA: (sparse) Partial Least Squares Discriminant Analysis
- Multilevel decomposition for repeated measurements
- NEW mixMC for 16S multivariate analysis (see preprint, currently in press)
- NEW mixMINT for vertical multiple integration (submitted)
- NEW mixDIABLO for horizontal multiple integration, based on this article, but with substantial improvements
- Graphical outputs to display the results and improve interpretation
- Example data sets
- breast.tumor (gene expression data, with missing data)
- linnerud: very small data set
- liver.toxicity (gene expression and clinical data)
- multidrug (ABC transporters and compounds)
- nutrimouse (gene expression and fatty acids data)
- srbct (gene expression data)
- yeast (metabolites data)
- vac18 and vac18.simulated for multilevel analyses
- NEW diverse.16S and Koren.16S for mixMC 16S analyses
- NEW breast.TCGA for DIABLO horizontal multiple integration analyses
- NEW stemcells for MINT vertical multiple integration analyses