Select your Method

mixOmics includes a range of statistical methods (also called models) which allow you to integrate datasets, identify important variables and make predictions on test data. This guide is to help you select which mixOmics method to use for your data and research question.

💡Tip: Refer to the glossary for unfamiliar terms


  1. Identify what kind of data you have

mixOmics is appropriate for any omics data (e.g. transcriptomics, metabolomics, proteomics, microbiome/metagenomics …) but also non-omics data (e.g. spectral imaging, clinical data). Before choosing your method, answer the following questions to identify some key characteristics from your data:

Do you have one modality or more?

If you have one modality (e.g. transcriptomics), you are working with Single ‘Omics. If you have multiple modalities (e.g. transcriptomics and metabolomics) you need N-integration methods. Some N-integration methods can only integrate two modalities, whilst others can integrated more (multiomic).

Do you have one study or more?

If you have more than one study (e.g. you collected transcriptomics data for 10 samples and then collected transcriptomics data again for another 20 samples), you need to perform P-integration

Do you have an output variable?

If you have an output of interest (e.g. you want to see if transciptomics data can predict disease outcome), you require a supervised method. If your analysis is exploratory without an outcome variable, you need an unsupervised model.

Is your output variable continuous or categorical?

If your output variable is categorical (e.g. classification of samples into groups A, B, C) you are facing a classification problem. If your output variable is continuous (e.g. protein level) you have a regression problem.

After answering these questions, you should now know whether you need a method which is:

  • Single ‘omics, N-integration (two modalities or more) or P-integration
  • Supervised (classification or regression) or unsupervised


2. Choose your method

This table provides an overview of the models you can build using mixOmics:

Model# of omics# of studiesIntegration?Supervised?
(s)PCA11No integrationUnsupervised
(s)IPCA11No integrationUnsupervised
(s)PLS-DA1 + categorical outcome1No integrationSupervised: Classification
(s)PLS21N-integrationUnsupervised and Supervised: Regression
(r)CCA21N-integrationUnsupervised
Block (s)PLS>21N-integrationSupervised: Regression
Block (s)PLS-DA (DIABLO)>2 + categorical outcome1N-integrationSupervised: Classification
(sparse) Generalised CCA>21N-integrationUnsupervised
MINT PCA1>1P-integrationUnsupervised
MINT (s)PLS-DA1 + categorical outcome>1P-integrationSupervised: Classification
MINT (s)PLS2>1P-integrationSupervised: Regression
MINT block (s)PLS>2>1P-integrationSupervised: Regression
MINT block (s)PLS-DA>2 + categorical outcome>1P-integrationSupervised: Classification

⚙️ Note: Most models have a sparse variant avaliable in mixOmics, e.g. sPCA is the sparse variant of PCA. Sparse models allow you to identify and use only the most important variables.

This decision tree can also be used to aid your model selection:

newplot

🦠 Microbiome data: If you are handling microbiome data, you can take advantage of the mixMC framework within the mixOmics package.


3. Learn more about your chosen method

Each mixOmics method has been documented on this website, with examples demonstrating the whole mixOmics workflow for each method in the Case Studies and the Vignette. Use the Resources Overview to help navigate more of our resources.