# mixMC

mixMC is a multivariate framework implemented in mixOmics for microbiome data analysis. The framework takes into account the inherent characteristics of microbiome data, namely sparsity (large number of zeroes in the data) and compositionality (resulting from the scaling we use to account for uneven sequencing depth per sample). The mixMC framework aims at identifying key microbial communities associated with their habitat or environment.

mixMC addresses the limitations of existing multivariate methods for microbiome studies and proposes unique analytical capabilities: it handles compositional and sparse data, repeated-measures experiments and multiclass problems; it highlights important discriminative features, and it provides interpretable graphical outputs to better understand the microbial communities contribution to each habitat. The framework from our paper is summarised below:

# To get started

library(mixOmics)


# Data

In the tabs under mixMC, examples are provided applying mixMC to microbiome 16S data sets. The data are directly available through the mixOmics package. If you would like to download the full data sets and the associated R scripts used for the paper, then click on the following links:

Non-Repeated Measures analysis with the Koren data set

Repeated Measures analysis with the HMP most diverse body sites

Repeated Measures with the HMP Oral body sites 16S

# How does mixMC fit into mixOmics?

mixMC is a pipeline we set up for microbial communities, using some of our standards methods in mixOmics but with a bit of tweaking. The method sPLS-DA has been improved with CLR transformation and includes a multilevel decomposition for repeated measurements design that are commonly encountered in microbiome studies. The multilevel approach we developed in [4] enables the detection of subtle differences when high inter-subject variability is present due to microbial sampling performed repeatedly on the same subjects but in multiple habitats. To account for subject variability the data variance is decomposed intowithin variation (due to habitat) and between subject variation [5], similar to a within-subjects ANOVA in univariate analyses.

As part of the pipeline we added the new graphical output plotLoadings to visualise the OTUs selected by sPLSDA on each component and the sample group or habitat in which the OTU is most (or least) abundant. See examples in our other tabs.

# What is new? What is next?

In collaboration with collaborators from INRA Toulouse, France we are in the process of linking with the R package mixKernel to integrate different types of data using kernel models (see here).

We are currently developing new multivariate methods for microbiome data analysis, so watch this space!