[Update: the workshop is full subscribed and registrations have closed!] This is the first edition of our advanced workshop, run by Dr Kim-Anh Lê Cao and Sébastien Déjean. The event and Dr Kim-Anh Lê Cao’s visit is sponsored by the visiting scientist program INP Toulouse and by the company Methodomics.
The mixOmics package has undergone substantial improvements and methodological developments in the last 18 months to address the strong demand from the computational and biological community to integrate multiple (>2) `omics data sets, including microbiome, genotype and longitudinal data. The aim of this advanced workshop is to introduce our new frameworks and encourage discussions, collaborations and suggested improvements on the themes including:
N-integration with DIABLO
P-integration with MINT
Longitudinal `omics analysis with timeOmics (not yet in mixOmics!)
Exploratory multivariate analysis with SNPOmics (not yet in mixOmics!)
mixMC: mixOmics for Microbial communities, with N-integration extensions
Prerequisites: Since this is an advanced course, we expect the participants to be expert in R programming language and familiar with multivariate projection based methods and mixOmics.
We have been quiet for a while, but we have some good news! A CRAN update, a manuscript in bioRxiv, a 3-year postdoc position open to be part of the mixOmics core team, and three workshops planned for the French autumn!
The 6.1.3 update is now on the CRAN, we fixed a few bugs (see list below), and we also have a new plotIndiv argument ‘background‘ to visualise the prediction area for a PLS-DA and sPLS-DA model (max 2 components). This is a powerful plot to visualise the effect of the different prediction methods. Why does a prediction method matters for the performance of the discriminant analysis models? See elements of information below.
Example of prediction area plot for the SRBCT data with a PLS-DA model, see ?srbct
All you need is the background.predict function, and overlay the results with plotIndiv. For example:
data(liver.toxicity)
X = liver.toxicity$gene
Y = as.factor(liver.toxicity$treatment[, 4])
plsda.liver = plsda(X, Y, ncomp = 2)
# calculating background for the two first components, and the mahalanobis distance
background = background.predict(plsda.liver, comp.predicted = 2, dist = "mahalanobis.dist")
plotIndiv(plsda.liver, background = background, legend = TRUE)
We also added the new functions get.confusion_matrix and get.BER to calculate a confusion matrix based on class prediction of test samples and their real class, and calculate their Balanced Error Rate, see ?get.BER. Example of outputs (for a DIABLO analysis on the breast cancer TCGA multi omics study):
Example from our DIABLO pipeline available at https://mixomics.org/wp-content/uploads/2012/03/mixOmicsRscripts.zip
We have submitted a new version of our mixOmics manuscript to bioRxiv! The manuscript is available at this link and has been a top tweeted story in #bioinformatics. The manuscript mostly summarises the latest mixOmics frameworks for Discriminant Analysis (sPLS-DA, DIABLO and MINT) with extensive R and Sweave codes here, give it a go! The supplemental thoroughly details these methods. It almost sounds like an end of a first mixOmics era as Florian, our very talented and dedicated core developer, debugger and developer of MINT has moved on for another postdoctoral position at the University of Queensland, and Kim-Anh is starting her new group as a Senior Lecturer position at the University of Melbourne (UoM), at the Centre for Systems Genomics. Do not fear, this means there will be a new round of developments, notably in the microbiome and metagenomics field, as we are opening a new 3-year senior postdoctoral positionin Computational Biostatistics at UoM (with opportunity to teach at the School of Mathematics and Statistics). More details at this link.
Seventeen multivariate methods currently implemented in mixOmics! Can you recognise your favourite?
Three workshops are coming up, between Sept – Nov 2017 in France. The first edition of MAW’17 is the advanced mixOmics workshop to introduce our new frameworks (published and in development: DIABLO, MINT, SNPOmics, timeOmics, mixMC and extension of integration) to our advanced users. The workshop is free, but you will need to cover your own travel and accommodation costs. Toulouse, 23-24 Oct 2017. Send us an email and we can send you the details. The two other workshops will be our normal beginner mixOmics workshops, in September (Lille) and in early November (Toulouse). More details on our website soon.
Other enhancements and bug fixes:
Enhancements:
————-
1 – perf.sgccda (for DIABLO) now implements a constraint model (see details in ?perf)
2 – legend = TRUE option in circosPlot and plotDiablo Bug fixes:
———-
– tune.splsda had a bug when assessing the ‘choice.ncomp’ based on ones-sided t-test of the error rate when the error rate was constant.
– sparse PCA deflation algorithm fixed
– added add mixOmics:: for pls functions to avoid clash with other packages
Why does a prediction distance matter? (full story in our manuscript)
The supervised multivariate methods in mixOmics can be applied on an external test set to predict the outcome of new samples with the predict function (predict), or to assess the performance of the statistical model (perf). The predict function calculates prediction scores for each new sample, or predicted coordinates, which are equivalent to the latent component scores in the training set.
Prediction distances. Our supervised models work with dummy indicator matrices Y to indicate the class membership of each sample, and result in a prediction score for each outcome category k, k = 1, . . . , K. Therefore, the scores across all classes K need to be combined to obtain the final prediction of a given test sample using a prediction distance. We propose distances such as ‘maximum distance’, ‘Mahalanobis distance’ and ‘Centroids distance’, as detailed our supplemental information and in ?predict. Those distance can give different predictions, which will be assessed in the performance of the model.
The new patch version of mixOmics is on CRAN. It includes a few bug fixes raised by our users (thank you!) and a few improvements. Florian Rohart has been fiddling really hard with ggplot2 to make a new plotIndiv version that can beautifully handle two legends!
Here is a list of the major bug fixes and improvements for 6.1.2:
New features:
————-
1 – tune.splsda now returns a ‘choice.ncomp’ which indicates the number of components to choose (only if nrepeat > 2, criterion based on t-tests)
2 – plotIndiv now enables two legends based on color, as well as pch, when pch is a factor different from what is indicated in group (use arguments pch and pch.levels, see ?plotIndiv)
Enhancements:
————-
1 – argument ‘cutoff’ now replaces ‘threshold’ in network for consistency with plotVar and circosPlot
2 – new argument ‘sd’ in plot.perf for block.splsda method
3 – new arguments “color.Y” and “color.blocks” in cimDiablo
4 – new argument ‘xlim’ in plotLoadings
Bug fixes:
———-
– directionality is now enforced in AUROC (results lower than 0.5 can be obtained, which would indicate a very poor model performance)
Manuscripts:
The MINT paper is out:
Rohart F., Matigian N., Eslami A., Bougeard S and Lê Cao, K. A.MINT: A multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. Now available on bioRxiv! in press in BMC Bioinformatics 18:128.
The mixOmics manuscript (first draft) is on bioRxiv, with sweave codes:
Rohart F., Gautier, B, Singh, A and Lê Cao, K. A. mixOmics: an R package for ‘omics feature selection and multiple data integration. On bioRxiv. Sweave and R scripts available here.
We have a new patch version 6.1.1 available from the CRAN to fix a few bugs by our team or mixOmics users (thank you!) and few enhancements and updates to follow ggplot2 updates.
For those using DIABLO, please note points 8 & 9 as we changed the default parameters for a scheme = ‘horst’ instead of ‘centroid’ and init = ‘svd.single’ instead of ‘svd’ in the methods, as we feel it was more appropriate. That may change your results compared to last version and you may want to use the old parameters instead.
New features:
1 – mint.pca function to perform unsupervised integration of independent data sets
2 – new weighted prediction for block approaches for both unsupervised and supervised analyses, see ?predict.spls and ?predict.splsda.
3 – ‘cpus’ parameter for sPLS-DA perf/tune and block.splsda perf/tune added to run the code in parallel
Enhancements:
4 – ‘constraint’ parameter for sPLS-DA perf and tune functions added.
5 – plotLoading for PCA object
6 – color argument in plot.tune and plot.perf added
Bug fixes:
7- predict with logratio (the logratio transform is now performed inside the predict function)
8- in block methods, scheme = ‘horst’ set by default instead of centroid
9- in block methods, initialisation set to svd.single by default
We list below some installation requirements to ensure the mixOmics workshop will run smoothly for everyone.
Important reminders. We expect the trainees to have a good working knowledge in R programming(e.g. handling data frame, perform simple calculations and display simple graphical outputs) to be able to fully enjoy the workshop. Attendees are requested to bring their own laptop as this is a hands-on workshop (we will alternate theory and practice).
Software installation and updates. To run the R scripts in this workshop, you will need to install or update the latest versions of R available from the CRAN (currently > 3.4, see also Installation guide for R and RStudio), followed by the update or installation of the following R packages:
mixOmics version 6.3.1(the version number is important)
mvtnorm
corrplot
igraph
The mixOmics package should directly import the following packages: igraph, rgl, ellipse, corpcor, RColorBrewer, plyr, parallel, dplyr, tidyr, reshape2, methods , matrixStats , rARPACK, gridExtra .
Check after install that the following does not throw any error*:
Participants, organisers and tutorsLooking very studious! We were hosted by the LIPM lab, INRA Auzeville Toulouse
Some feedback from our participants:
Overall I did enjoy the workshop, it was one of the most interesting and well put together that I have attended. Thank you very much.
The tutorials on the website are excellent for training.
It was a very good mixture of theory and practice to directly try out the methods. Also there were many experts who where available for questions. The presentations were quite clear to me as well as the course material and the provided scripts.
‘[Day 3] was useful, because it allows to check if we have well understood the use of each analysis, and bring our own data allows to make these analysis more concrete.’
[…] I could discuss with some other participants with similar experimental design and see how they think [they can] apply mixOmics
Some useful references discussed during the workshop:
Liu et al 2015: we used Principal Component Curves (a variant of PCA, but where you fit a curve, and where you need a ‘reference’ group) to quantify pathway regulation of Homologous Recombination in breast cancer.
Singh et al. 2016 (bioRxiv): the asthma study (#2) summarised some of the omics data sets into gene modules to quantify pathways before the integration step. This is the DIABLO paper.
Straube et al 2015: the linear mixed model framework to reduce the dimension of time course data from (n x p x T) to (T x p), lmms is available on CRAN.
Straube et al 2016: Dynomics to detect delay between time course data. Submitted.
We are proud to announce our new update 6.1.0 available on CRAN. It was supposed to be a small patch but we got slightly ahead of ourselves. Special thanks to the mixOmics French’Oz developers, Dr Florian Rohart (University of Queensland, Brisbane) and Mr François Bartolo (Université de Toulouse, France), as well as several users who have been using our latest methods and reported bugs or suggested improvements on our bitbucket issue website.
Manuscripts and publication update
Rohart F., Matigian N., Eslami A., Bougeard S and Lê Cao, K. A..MINT: A multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. Now available on bioRxiv!
Singh A, Gautier B, Shannon C, Vacher M, Rohart F, Tebbutt S, K-A. Lê Cao. DIABLO – multi-omics data integration for biomarker discovery. Manuscript available in bioRxiv.
K-A. Lê Cao*, ME Costello*, VA Lakis, F Bartolo, XY Chua, R Brazeilles, P Rondeau. (2016) MixMC: Multivariate insights into Microbial Communities.PLoS ONE 11(8): e0160169 [link]
List of changes in mixOmics 6.1.0 (in NEWS file)
In short,
– cimDIABLO argument ‘corThreshold’ replaced by ‘cutoff’
– new plots of tune and perf results now available
– tune function for block.splsda/DIABLO method
– auroc for supervised methods
New features:
1- auroc function applicable for (mint).(block).(s)plsda objects. AUc values also included in perf and tune functions (except mixDIABLO module)
2- tune.block.splsda function to chose the keepX parameters of block.splsda (a.k.a mixDIABLO)
3- plot for perf objects displays the classification error rate w.r.t components
4- plot for tune objects displays the classification error rate w.r.t keepX values (not implemented for tune.block.splsda)
5- multilevel function has been removed (as planned) as it is now included as an argument in other functions (see pca, pls, splsda, etc)
Enhancements:
1 – All tune functions (except for mixDIABLO/block.splsda module) include a ‘constraint’ argument to either build the model based on user input specific parameters (object$keepX.constraint) or based on the optimal parameter keepX determined by the tune function, see examples in help files.
2 – All perf functions (except for mixDIABLO/block.splsda module) have now a ‘constraint’ argument that allows the performances to be calculate either based on the number of parameters (object$keepX) defined in object or based on the variables selected on each component, see examples in help files.
3 – max.iter has been set to 100 to speed up computational time for all multivariate methods except pca/spca.
4 – cimDiablo: new arguments include transpose, row.names and col.names
5 – circosPlot: new arguments include var.names and comp. Argument ‘corThreshold’ has been replaced by ‘cutoff’.
6 – plotIndiv: new argument legend.title
7 – network function for block.spls(da) models and allows to plot for more than 2 blocks
8 – PCA: new argument ilr.offset to be used only for ILR log transform in PCA (mixMC module)
9 – Legend added in plotDiablo, new argument legend.ncol
Bug fixes:
1 – plotIndiv and ellipse: plot ellipse for all groups with more than 1 sample
2 – predict function: argument multilevel added, log transform included
3 – Call to plsda.vip() from the RVAideMemoire package
4 – other small bugs as listed in out bitbucket issues, matching rgl package changes.
We are preparing a patch to fix some small bugs we (and other users) noticed since we released version 6.0.0. The .zip (windows) and .tar.gz (linux / mac) can be downloaded from this page. We plan to push a completed patch on the CRAN end of august 2016.