[Update: 5 spots left, contact us] ]Following last year’s success of our COST workshop, the second edition will be run by Dr Sébastien Déjean and his crew in Toulouse. The event is organised by the local committee at UGSF (Drs Estelle Goulas, Anne-Sophie Blervacq, Anne Creach, Brigitte Huss and Prof Simon Hawkins)
Dates: 12-14 September (3 days)
Venue: Toulouse, France, TBA
Fees: 300 EUR (academics) and 600 (private) that include tuition, course material, coffee breaks, lunches and one dinner in town. Bursary for 12 PhD students and early career researchers are funded by COST ACTION FA1306, apply!
Some feedback from our participants to the question: ‘What did you like most about that workshop?’ (Survey Monkey results)
Theoretical + practical courses, course materials are really great
Regular oral review of the take-home messages
The slides and Kim-Anh presentations: very pedagogical
The workshop provides the exact combination of theory and practical exercises I liked to have. The examples with R scripts are so organized that you can understand the thinking process behind the analysis.
Open atmosphere and good pace, with enough of theory to understand the core principles
Didactic speakers, not much mathematics and formulas, alternance of theory and practice, well prepared R scripts and documents
The use of these tools is straightforward
Both the lecturer created a very nice exchange with the group, making everyone comfortable in making questions and express doubts.
Clear- Concise- Adaptable- Very complete R scripts and pdf documents
——–
We will be running a classic 2-day mixOmics workshop in November, taught by Dr Kim-Anh Lê Cao and Sébastien Déjean and other mixOmics team contributors. The event and Dr Kim-Anh Lê Cao’s visit is sponsored by the visiting scientist program INP Toulouse.
The objective of the workshop is to introduce the fundamental concepts of multivariate dimension reduction methodologies. Those methods are particularly useful for data exploration and integration of large data sets, and especially in the context of systems biology, or in research areas where statistical data integration is required. Each methodology (one ‘omics, 2 and multiple ‘omics integration) that will be presented during the course will be applied on biological “omics” studies including transcriptomics, metabolomic, proteomics and microbiome data sets using the R package mixOmics
Prerequisites: We expect the participants to a good working knowledge in R (e.g. handling data frames and perform basic calculations). Participants are requested to bring their own laptops, having installed the software RStudio and the R package mixOmics (instructions provided prior to the training).
Practical information: The workshop is free of charge for all participants as it is fully sponsored by INP. Priority will be given to INP students, external postgraduate students and early career researcher. The workshop includes tuition, course material. The workshop excludes tea/coffee and lunch during the breaks.
[Update: the workshop is full subscribed and registrations have closed!] This is the first edition of our advanced workshop, run by Dr Kim-Anh Lê Cao and Sébastien Déjean. The event and Dr Kim-Anh Lê Cao’s visit is sponsored by the visiting scientist program INP Toulouse and by the company Methodomics.
The mixOmics package has undergone substantial improvements and methodological developments in the last 18 months to address the strong demand from the computational and biological community to integrate multiple (>2) `omics data sets, including microbiome, genotype and longitudinal data. The aim of this advanced workshop is to introduce our new frameworks and encourage discussions, collaborations and suggested improvements on the themes including:
N-integration with DIABLO
P-integration with MINT
Longitudinal `omics analysis with timeOmics (not yet in mixOmics!)
Exploratory multivariate analysis with SNPOmics (not yet in mixOmics!)
mixMC: mixOmics for Microbial communities, with N-integration extensions
Prerequisites: Since this is an advanced course, we expect the participants to be expert in R programming language and familiar with multivariate projection based methods and mixOmics.
We have been quiet for a while, but we have some good news! A CRAN update, a manuscript in bioRxiv, a 3-year postdoc position open to be part of the mixOmics core team, and three workshops planned for the French autumn!
The 6.1.3 update is now on the CRAN, we fixed a few bugs (see list below), and we also have a new plotIndiv argument ‘background‘ to visualise the prediction area for a PLS-DA and sPLS-DA model (max 2 components). This is a powerful plot to visualise the effect of the different prediction methods. Why does a prediction method matters for the performance of the discriminant analysis models? See elements of information below.
All you need is the background.predict function, and overlay the results with plotIndiv. For example:
data(liver.toxicity)
X = liver.toxicity$gene
Y = as.factor(liver.toxicity$treatment[, 4])
plsda.liver = plsda(X, Y, ncomp = 2)
# calculating background for the two first components, and the mahalanobis distance
background = background.predict(plsda.liver, comp.predicted = 2, dist = "mahalanobis.dist")
plotIndiv(plsda.liver, background = background, legend = TRUE)
We also added the new functions get.confusion_matrix and get.BER to calculate a confusion matrix based on class prediction of test samples and their real class, and calculate their Balanced Error Rate, see ?get.BER. Example of outputs (for a DIABLO analysis on the breast cancer TCGA multi omics study):
We have submitted a new version of our mixOmics manuscript to bioRxiv! The manuscript is available at this link and has been a top tweeted story in #bioinformatics. The manuscript mostly summarises the latest mixOmics frameworks for Discriminant Analysis (sPLS-DA, DIABLO and MINT) with extensive R and Sweave codes here, give it a go! The supplemental thoroughly details these methods. It almost sounds like an end of a first mixOmics era as Florian, our very talented and dedicated core developer, debugger and developer of MINT has moved on for another postdoctoral position at the University of Queensland, and Kim-Anh is starting her new group as a Senior Lecturer position at the University of Melbourne (UoM), at the Centre for Systems Genomics. Do not fear, this means there will be a new round of developments, notably in the microbiome and metagenomics field, as we are opening a new 3-year senior postdoctoral positionin Computational Biostatistics at UoM (with opportunity to teach at the School of Mathematics and Statistics). More details at this link.
Three workshops are coming up, between Sept – Nov 2017 in France. The first edition of MAW’17 is the advanced mixOmics workshop to introduce our new frameworks (published and in development: DIABLO, MINT, SNPOmics, timeOmics, mixMC and extension of integration) to our advanced users. The workshop is free, but you will need to cover your own travel and accommodation costs. Toulouse, 23-24 Oct 2017. Send us an email and we can send you the details. The two other workshops will be our normal beginner mixOmics workshops, in September (Lille) and in early November (Toulouse). More details on our website soon.
Other enhancements and bug fixes:
Enhancements:
————-
1 – perf.sgccda (for DIABLO) now implements a constraint model (see details in ?perf)
2 – legend = TRUE option in circosPlot and plotDiablo Bug fixes:
———-
– tune.splsda had a bug when assessing the ‘choice.ncomp’ based on ones-sided t-test of the error rate when the error rate was constant.
– sparse PCA deflation algorithm fixed
– added add mixOmics:: for pls functions to avoid clash with other packages
Why does a prediction distance matter? (full story in our manuscript)
The supervised multivariate methods in mixOmics can be applied on an external test set to predict the outcome of new samples with the predict function (predict), or to assess the performance of the statistical model (perf). The predict function calculates prediction scores for each new sample, or predicted coordinates, which are equivalent to the latent component scores in the training set.
Prediction distances. Our supervised models work with dummy indicator matrices Y to indicate the class membership of each sample, and result in a prediction score for each outcome category k, k = 1, . . . , K. Therefore, the scores across all classes K need to be combined to obtain the final prediction of a given test sample using a prediction distance. We propose distances such as ‘maximum distance’, ‘Mahalanobis distance’ and ‘Centroids distance’, as detailed our supplemental information and in ?predict. Those distance can give different predictions, which will be assessed in the performance of the model.
The new patch version of mixOmics is on CRAN. It includes a few bug fixes raised by our users (thank you!) and a few improvements. Florian Rohart has been fiddling really hard with ggplot2 to make a new plotIndiv version that can beautifully handle two legends!
Here is a list of the major bug fixes and improvements for 6.1.2:
New features:
————-
1 – tune.splsda now returns a ‘choice.ncomp’ which indicates the number of components to choose (only if nrepeat > 2, criterion based on t-tests)
2 – plotIndiv now enables two legends based on color, as well as pch, when pch is a factor different from what is indicated in group (use arguments pch and pch.levels, see ?plotIndiv)
Enhancements:
————-
1 – argument ‘cutoff’ now replaces ‘threshold’ in network for consistency with plotVar and circosPlot
2 – new argument ‘sd’ in plot.perf for block.splsda method
3 – new arguments “color.Y” and “color.blocks” in cimDiablo
4 – new argument ‘xlim’ in plotLoadings
Bug fixes:
———-
– directionality is now enforced in AUROC (results lower than 0.5 can be obtained, which would indicate a very poor model performance)
Manuscripts:
The MINT paper is out:
Rohart F., Matigian N., Eslami A., Bougeard S and Lê Cao, K. A.MINT: A multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. Now available on bioRxiv! in press in BMC Bioinformatics 18:128.
The mixOmics manuscript (first draft) is on bioRxiv, with sweave codes:
Rohart F., Gautier, B, Singh, A and Lê Cao, K. A. mixOmics: an R package for ‘omics feature selection and multiple data integration. On bioRxiv. Sweave and R scripts available here.
We have a new patch version 6.1.1 available from the CRAN to fix a few bugs by our team or mixOmics users (thank you!) and few enhancements and updates to follow ggplot2 updates.
For those using DIABLO, please note points 8 & 9 as we changed the default parameters for a scheme = ‘horst’ instead of ‘centroid’ and init = ‘svd.single’ instead of ‘svd’ in the methods, as we feel it was more appropriate. That may change your results compared to last version and you may want to use the old parameters instead.
New features:
1 – mint.pca function to perform unsupervised integration of independent data sets
2 – new weighted prediction for block approaches for both unsupervised and supervised analyses, see ?predict.spls and ?predict.splsda.
3 – ‘cpus’ parameter for sPLS-DA perf/tune and block.splsda perf/tune added to run the code in parallel
Enhancements:
4 – ‘constraint’ parameter for sPLS-DA perf and tune functions added.
5 – plotLoading for PCA object
6 – color argument in plot.tune and plot.perf added
Bug fixes:
7- predict with logratio (the logratio transform is now performed inside the predict function)
8- in block methods, scheme = ‘horst’ set by default instead of centroid
9- in block methods, initialisation set to svd.single by default
We list below some installation requirements to ensure the mixOmics workshop will run smoothly for everyone.
Important reminders. We expect the trainees to have a good working knowledge in R programming(e.g. handling data frame, perform simple calculations and display simple graphical outputs) to be able to fully enjoy the workshop. Attendees are requested to bring their own laptop as this is a hands-on workshop (we will alternate theory and practice).
Software installation and updates. To run the R scripts in this workshop, you will need to install or update the latest versions of R available from the CRAN (currently > 3.4, see also Installation guide for R and RStudio), followed by the update or installation of the following R packages:
mixOmics version 6.3.1(the version number is important)
mvtnorm
corrplot
igraph
The mixOmics package should directly import the following packages: igraph, rgl, ellipse, corpcor, RColorBrewer, plyr, parallel, dplyr, tidyr, reshape2, methods , matrixStats , rARPACK, gridExtra .
Check after install that the following does not throw any error*:
Overall I did enjoy the workshop, it was one of the most interesting and well put together that I have attended. Thank you very much.
The tutorials on the website are excellent for training.
It was a very good mixture of theory and practice to directly try out the methods. Also there were many experts who where available for questions. The presentations were quite clear to me as well as the course material and the provided scripts.
‘[Day 3] was useful, because it allows to check if we have well understood the use of each analysis, and bring our own data allows to make these analysis more concrete.’
[…] I could discuss with some other participants with similar experimental design and see how they think [they can] apply mixOmics
Some useful references discussed during the workshop:
Liu et al 2015: we used Principal Component Curves (a variant of PCA, but where you fit a curve, and where you need a ‘reference’ group) to quantify pathway regulation of Homologous Recombination in breast cancer.
Singh et al. 2016 (bioRxiv): the asthma study (#2) summarised some of the omics data sets into gene modules to quantify pathways before the integration step. This is the DIABLO paper.
Straube et al 2015: the linear mixed model framework to reduce the dimension of time course data from (n x p x T) to (T x p), lmms is available on CRAN.
Straube et al 2016: Dynomics to detect delay between time course data. Submitted.
We are proud to announce our new update 6.1.0 available on CRAN. It was supposed to be a small patch but we got slightly ahead of ourselves. Special thanks to the mixOmics French’Oz developers, Dr Florian Rohart (University of Queensland, Brisbane) and Mr François Bartolo (Université de Toulouse, France), as well as several users who have been using our latest methods and reported bugs or suggested improvements on our bitbucket issue website.
Manuscripts and publication update
Rohart F., Matigian N., Eslami A., Bougeard S and Lê Cao, K. A..MINT: A multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. Now available on bioRxiv!
Singh A, Gautier B, Shannon C, Vacher M, Rohart F, Tebbutt S, K-A. Lê Cao. DIABLO – multi-omics data integration for biomarker discovery. Manuscript available in bioRxiv.
K-A. Lê Cao*, ME Costello*, VA Lakis, F Bartolo, XY Chua, R Brazeilles, P Rondeau. (2016) MixMC: Multivariate insights into Microbial Communities.PLoS ONE 11(8): e0160169 [link]
List of changes in mixOmics 6.1.0 (in NEWS file)
In short,
– cimDIABLO argument ‘corThreshold’ replaced by ‘cutoff’
– new plots of tune and perf results now available
– tune function for block.splsda/DIABLO method
– auroc for supervised methods
New features:
1- auroc function applicable for (mint).(block).(s)plsda objects. AUc values also included in perf and tune functions (except mixDIABLO module)
2- tune.block.splsda function to chose the keepX parameters of block.splsda (a.k.a mixDIABLO)
3- plot for perf objects displays the classification error rate w.r.t components
4- plot for tune objects displays the classification error rate w.r.t keepX values (not implemented for tune.block.splsda)
5- multilevel function has been removed (as planned) as it is now included as an argument in other functions (see pca, pls, splsda, etc)
Enhancements:
1 – All tune functions (except for mixDIABLO/block.splsda module) include a ‘constraint’ argument to either build the model based on user input specific parameters (object$keepX.constraint) or based on the optimal parameter keepX determined by the tune function, see examples in help files.
2 – All perf functions (except for mixDIABLO/block.splsda module) have now a ‘constraint’ argument that allows the performances to be calculate either based on the number of parameters (object$keepX) defined in object or based on the variables selected on each component, see examples in help files.
3 – max.iter has been set to 100 to speed up computational time for all multivariate methods except pca/spca.
4 – cimDiablo: new arguments include transpose, row.names and col.names
5 – circosPlot: new arguments include var.names and comp. Argument ‘corThreshold’ has been replaced by ‘cutoff’.
6 – plotIndiv: new argument legend.title
7 – network function for block.spls(da) models and allows to plot for more than 2 blocks
8 – PCA: new argument ilr.offset to be used only for ILR log transform in PCA (mixMC module)
9 – Legend added in plotDiablo, new argument legend.ncol
Bug fixes:
1 – plotIndiv and ellipse: plot ellipse for all groups with more than 1 sample
2 – predict function: argument multilevel added, log transform included
3 – Call to plsda.vip() from the RVAideMemoire package
4 – other small bugs as listed in out bitbucket issues, matching rgl package changes.