The mixOmics R package is organised into three main parts: methods to analyse data, plotting functions and example data sets. Below is a non-exhaustive list of what you can find in the mixOmics package and their relevant publications.
- Statistical methods to analyse high throughput data
- (s)PCA: (sparse) Principal Component Analysis – Shen and Huang 2008
- (s)IPCA: (sparse) Independent Principal Component Analysis – Yao et al. 2012
- (r)CCA: (regularized) Canonical Correlation Analysis – Gonzales et al. 2008
- (s)PLS: (sparse) Partial Least Squares – articles for regression or canonical deflations
- (s)PLS-DA: (sparse) Partial Least Squares Discriminant Analysis – Lê Cao et al. 2011 sPLS-DA
- Multilevel decomposition: for repeated measurements – Liquet et al. 2012
- mixMC: for 16S multivariate analysis – Lê Cao et al. 2016
- MINT: for vertical multiple integration – Rohart et al. 2017
- DIABLO: for horizontal multiple integration – Singh et al. 2019
Note: The integrative and supervised methods in mixOmics are summarised and presented in our mixOmics article – Rohart et al. 2017
2. Plotting functions to display and interpret the results
- 2D and 3D sample plots (with optional confidence ellipses)
- Arrow plots (to visualise paired coordinates)
- Relevance Network Graphs (to see associations between variables) – González et al. 2012
- Clustered Image Maps (heatmaps for expression values or correlation) – González et al. 2012
- Correlation circle plots (correlation of variables to latent components) – González et al. 2012
- Circos plots for DIABLO analyses (see how datasets relate to each other) – Singh et al. 2019
- Loading plots (to see variable importance) – Lê Cao et al. 2016
3. Example data sets
Each type of biological question can be answered with a specific method. This is why we provide in the package a whole range of case studies to illustrate each method.
Single omics:
- multidrug (ABC transporters and compounds data for 60 samples from different cell lines. Used in sPCA case study.) – Szakács et al., 2004
- srbct (gene expression data for 63 samples grouped into tumour classes. Used in sPLS-DA case study) – Khan et al, 2001
- breast.tumors (mRNA data for 47 samples, with missing data)
- vac18 (gene expression data for 42 samples across different stimulation groups. Used in multilevel case study.) – Salmon-Céron et al. 2010
- vac18.simulated (simulated time-course gene expression data for 48 samples across different stimulation groups)
- linnerud (very small dataset with 3 physiological metrics and 3 exercise metrics measured across 20 participants. Used to illustrate key concepts)
- yeast (metabolite data across 37 samples and different strains and conditions)
Multiple omics:
- liver.toxicity (mRNA and clinical data for 64 rats subjected to varying levels of acetaminophen. Used in sIPCA case study and the sPLS case study) – Bushel et al., 2007
- nutrimouse (mRNA and lipid data for 40 mice grouped by diet and genotype. Used in rCCA case study.) – Martin et al., 2007
- breast.TCGA (miRNA, mRNA and protein data from human breast cancer tissue categorised into subtypes. 150 samples in training subset and 70 samples in test subset. Used in DIABLO case study.) – Network et al., 2012
Multiple studies:
- stemcells (gene expression data for 125 samples across 4 studies and 3 cell lines. Used in MINT case study.)
Microbiome data:
- diverse.16S (16S microbiome data for 162 samples from different body sites. Used in mixMC case study.) – Human Microbiome Project 16S dataset
- koren.16S (microbiome data for 43 samples from different body sites. Used in mixMC case study.) – Koren et al. 2013