Welcome to mixOmics!
mixOmics is collaborative project between Australia (Melbourne), France (Toulouse), and Canada (Vancouver). The core team includes Kim-Anh Lê Cao (University of Melbourne), Florian Rohart (Brisbane) and Sébastien Déjean (Toulouse). We also have key contributors, past (Benoît Gautier, François Bartolo) and present (Al Abadi, University of Melbourne) and several collaborators including Amrit Singh (University of British Columbia), Olivier Chapleur (INRA, Paris) – it could be you too if you wish to be involved: we host many visitors with computational, statistical and biological backgrounds!
Why multivariate methods?
It is generally admitted that single ‘omics analysis does not provide enough information to give a deep understanding of a biological system, but we can obtain a more holistic view of a system by combining multiple ‘omics analyses. Our mixOmics R package proposes a whole range of multivariate methods that we developed and validated on many biological studies to gain more insight into ‘omics biological studies.
mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection
Multivariate methods are well suited to large ‘omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (‘components’), which are defined as combination of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structure between the different data sets that are integrated. We have developed several sparse multivariate models to identify the key variables that are highly correlated, and/or explain the biological outcome of interest (e.g. disease status). The identified variables are then more amenable to statistical inference and to posit novel biological hypotheses to be further validated in the laboratory.
Which type of data?
The data analysed with mixOmics may come from high throughput sequencing technologies, such as ‘omics data (transcriptomics, metabolomics, proteomics, microbiome/metagenomics …) but also beyond the realm of ‘omics (e.g. spectral imaging). We are currently developing new methods to integrate time-course or longitudinal omics data. Other avenues are investigated to integrate genotype data.
The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data.
The mixOmics framework today
The toolkit includes 19 multivariate methodologies today, depicted below depending on the data to integrate and the biological questions (e.g. exploration, discriminant analysis, data integration for 2 or more data sets).