mixOmics is an R package designed for the analysis and visualization of high-throughput data. It includes statistical methods, plotting functions, and example datasets for various biological questions. Here’s a quick guide to what you’ll find inside:
💡 Note: for unfamiliar terms check out our Glossary
1. Statistical methods to analyse high throughput data
We offer a range of powerful statistical methods for analyzing your data. Here are some key methods:
- (s)PCA: (sparse) Principal Component Analysis – Shen and Huang 2008
- (s)IPCA: (sparse) Independent Principal Component Analysis – Yao et al. 2012
- (r)CCA: (regularized) Canonical Correlation Analysis – Gonzales et al. 2008
- (s)PLS: (sparse) Partial Least Squares – articles for regression or canonical deflations
- (s)PLS-DA: (sparse) Partial Least Squares Discriminant Analysis – Lê Cao et al. 2011 sPLS-DA
- Multilevel decomposition: for repeated measurements – Liquet et al. 2012
- mixMC: for 16S multivariate analysis – Lê Cao et al. 2016
- MINT: for P-integration – Rohart et al. 2017
- DIABLO: for multiblock N-integration – Singh et al. 2019
Learn More: Check out our mixOmics article for an overview of integrative and supervised methods.
2. Plotting functions to display and interpret the results
- 2D and 3D sample plot: to visualise sample relationships
- Arrow plots: to visualise paired coordinates
- Relevance Network Graphs:to visualise associations between variables
- Clustered Image Maps:heatmaps of expression values or correlation between variables
- Correlation circle plots:correlation of variables to latent components
- Circos plots: for DIABLO analyses, visualise how modalities relate to each other – Singh et al. 2019
- Loading plots:to visualise variable importance – Lê Cao et al. 2016
Learn More: Check out our González et al. 2012 for an overview of network graphs, clustered image maps and correlation circle plots
3. Example data sets
To help you get started, we provide a variety of datasets illustrating different biological questions, some of these are used in the case studies.
Data | Integration type | Omics | Samples | Groups | Case study | Publication |
---|---|---|---|---|---|---|
Multidrug | Single omics | Transporter expression, drug activity | 60 | 7 cell lines | sPCA, sPCA case study | Szakács et al., 2004 |
SRBCT | Single omics | mRNA | 63 | 4 tumour classes | Performance assessment and parameter tuning, sPLS-DA, sPLS-DA case study | Khan et al, 2001 |
vac18 | Single omics | mRNA | 42 | 4 stimulation groups, multilevel | Multilevel, Multilevel case study | Salmon-Céron et al. 2010 |
liver.toxicity | N-integration- two omics | mRNA, clinical data | 64 | 4 treatment doses, 4 treatment times | Missing Values, Parameter tuning, sIPCA, sIPCA case study, sPLS, sPLS case study | Bushel et al., 2007 |
nutrimouse | N-integration-two omics | mRNA, lipid data | 40 | 4 diet groups, 2 genotypes | rCCA, rCCA case study | Martin et al., 2007 |
breast.TGCA | N-integration-multiomics | miRNA, mRNA, protein | 150 training, 70 test | 3 cancer subtypes | DIABLO, DIABLO case study | Network et al., 2012 |
stemcells | P-integration | mRNA | 125 | 3 cell lines, 4 studies | MINT, MINT case study | |
diverse.16S | Single omics | microbiome | 162 | 3 body sites | mixMC case study | Human Microbiome Project 16S dataset |
koren.16S | Single omics | microbiome | 43 | 3 body sites | mixMC pre-processing, mixMC case study | Koren et al. 2013 |