Forum

Nov 22-24 2017, Toulouse, FR

[Update:  5 spots left, contact us] ]Following last year’s success of our COST workshop, the second edition will be run by Dr Sébastien Déjean and his crew in Toulouse. The event is organised by the local committee at UGSF (Drs Estelle Goulas, Anne-Sophie Blervacq, Anne Creach, Brigitte Huss and Prof Simon Hawkins)

Dates: 12-14 September (3 days)

Venue: Toulouse, France, TBA

Fees: 300 EUR (academics) and 600 (private) that include tuition, course material, coffee breaks, lunches and one dinner in town. Bursary for 12 PhD students and early career researchers are funded by COST ACTION FA1306, apply!

Application:  see details here.

Send your CV to: Estelle.goulas [at] univ-lille1.fr and mention whether you are applying for a travel bursary.

Deadline for application: 15 October 2017 

More details: at this link.

Nov 9-10 2017, Toulouse, FR

Some feedback from our participants to the question:  ‘What did you like most about that workshop?’ (Survey Monkey results)

  • Theoretical + practical courses, course materials are really great
  • Regular oral review of the take-home messages
  • The slides and Kim-Anh presentations: very pedagogical
  • The workshop provides the exact combination of theory and practical exercises I liked to have. The examples with R scripts are so organized that you can understand the thinking process behind the analysis.
  • Open atmosphere and good pace, with enough of theory to understand the core principles
  • Didactic speakers, not much mathematics and formulas, alternance of theory and practice, well prepared R scripts and documents
  • The use of these tools is straightforward
  • Both the lecturer created a very nice exchange with the group, making everyone comfortable in making questions and express doubts.
  • Clear- Concise- Adaptable- Very complete R scripts and pdf documents

——–

We will be running a classic 2-day mixOmics workshop in November, taught by Dr Kim-Anh Lê Cao and Sébastien Déjean and other mixOmics team contributors. The event and Dr Kim-Anh Lê Cao’s visit is sponsored by the visiting scientist program INP Toulouse.

The objective of the workshop is to introduce the fundamental concepts of multivariate dimension reduction methodologies. Those methods are particularly useful for data exploration and integration of large data sets, and especially in the context of systems biology, or in research areas where statistical data integration is required. Each methodology (one ‘omics, 2 and multiple ‘omics integration) that will be presented during the course will be applied on biological “omics” studies including transcriptomics, metabolomic, proteomics and microbiome data sets using the R package mixOmics

Date: 9-10 November 2017

Venue: salle E111, Batiment E, UMR-GenPhySE, INRA Toulouse Castanet Tolosan (map access coming soon)

Prerequisites: We expect the participants to a good working knowledge in R (e.g. handling data frames and perform basic calculations). Participants are requested to bring their own laptops, having installed the software RStudio and the R package mixOmics (instructions provided prior to the training).

Practical information: The workshop is free of charge for all participants as it is fully sponsored by INP. Priority will be given to INP students, external postgraduate students and early career researcher. The workshop includes tuition, course material. The workshop excludes tea/coffee and lunch during the breaks.

More details: see this flyer.

Register: registrations have now closed. 

Want to know more? contact us at mixomics [at] math.univ-toulouse.fr

Oct 23-24 2017, Toulouse, FR, Advanced Workshop

[Update: the workshop is full subscribed and registrations have closed!] This is the first edition of our advanced workshop, run by Dr Kim-Anh Lê Cao and Sébastien Déjean. The event and Dr Kim-Anh Lê Cao’s visit is sponsored by the visiting scientist program INP Toulouse and by the company Methodomics.

The mixOmics package has undergone substantial improvements and methodological developments in the last 18 months to address the strong demand from the computational and biological community to integrate multiple (>2) `omics data sets, including microbiome, genotype and longitudinal data. The aim of this advanced workshop is to introduce our new frameworks and encourage discussions, collaborations and suggested improvements on the themes including:

  1. N-integration with DIABLO
  2. P-integration with MINT
  3. Longitudinal `omics analysis with timeOmics (not yet in mixOmics!)
  4. Exploratory multivariate analysis with SNPOmics (not yet in mixOmics!)
  5. mixMC: mixOmics for Microbial communities, with N-integration extensions

Date: 23-24 October 2017

Venue: Ground level, building ‘SDAR’, INRA Toulouse, Castanet Tolosan (map access coming soon)

Prerequisites: Since this is an advanced course, we expect the participants to be expert in R programming language and familiar with multivariate projection based methods and mixOmics.

More details: at this link

Interested? contact us at mixomics [at] math.univ-toulouse.fr

Update 6.1.3 on CRAN, postdoc position, manuscript and upcoming workshops

Dear mixOmics users,

We have been quiet for a while, but we have some good news! A CRAN update, a manuscript in bioRxiv, a 3-year postdoc position open to be part of the mixOmics core team, and three workshops planned for the French autumn!

The 6.1.3 update is now on the CRAN, we fixed a few bugs (see list below), and we also have a new plotIndiv argument ‘background‘ to visualise the prediction area for a PLS-DA and sPLS-DA model (max 2 components). This is a powerful plot to visualise the effect of the different prediction methods. Why does a prediction method matters for the performance of the discriminant analysis models? See elements of information below.

Example of prediction area plot for the SRBCT data with a PLS-DA model, see ?srbct

All you need is the background.predict function, and overlay the results with plotIndiv. For example:

data(liver.toxicity)
X = liver.toxicity$gene
Y = as.factor(liver.toxicity$treatment[, 4])
plsda.liver = plsda(X, Y, ncomp = 2)

# calculating background for the two first components, and the mahalanobis distance
background = background.predict(plsda.liver, comp.predicted = 2, dist = "mahalanobis.dist")

plotIndiv(plsda.liver, background = background, legend = TRUE)

We also added the new functions get.confusion_matrix and get.BER to calculate a confusion matrix based on class prediction of test samples and their real class, and calculate their Balanced Error Rate, see ?get.BER. Example of outputs (for a DIABLO analysis on the breast cancer TCGA multi omics study):

Example from our DIABLO pipeline available at https://mixomics.org/wp-content/uploads/2012/03/mixOmicsRscripts.zip

 

We have submitted a new version of our mixOmics manuscript to bioRxiv! The manuscript is available at this link and has been a top tweeted story in #bioinformatics. The manuscript mostly summarises the latest mixOmics frameworks for Discriminant Analysis (sPLS-DA, DIABLO and MINT) with extensive R and Sweave codes here, give it a go! The supplemental thoroughly details these methods. It almost sounds like an end of a first mixOmics era as Florian, our very talented and dedicated core developer, debugger and developer of MINT has moved on for another postdoctoral position at the University of Queensland, and Kim-Anh is starting her new group as a Senior Lecturer position at the University of Melbourne (UoM), at the Centre for Systems Genomics. Do not fear, this means there will be a new round of developments, notably in the microbiome and metagenomics field, as we are opening a new 3-year senior postdoctoral position in Computational Biostatistics at UoM (with opportunity to teach at the School of Mathematics and Statistics). More details at this link.

Seventeen multivariate methods currently implemented in mixOmics! Can you recognise your favourite?

Three workshops are coming up, between Sept – Nov 2017 in France. The first edition of MAW’17 is the advanced mixOmics workshop to introduce our new frameworks (published and in development: DIABLO, MINT, SNPOmics, timeOmics, mixMC and extension of integration) to our advanced users. The workshop is free, but you will need to cover your own travel and accommodation costs. Toulouse, 23-24 Oct 2017. Send us an email and we can send you the details. The two other workshops will be our normal beginner mixOmics workshops, in September (Lille) and in early November (Toulouse). More details on our website soon.

 

Other enhancements and bug fixes:

Enhancements:
————-
1 – perf.sgccda (for DIABLO) now implements a constraint model (see details in ?perf)
2 – legend = TRUE option in circosPlot and plotDiablo
Bug fixes:
———-
– tune.splsda had a bug when assessing the ‘choice.ncomp’ based on ones-sided t-test of the error rate when the error rate was constant.
– sparse PCA deflation algorithm fixed
– added add mixOmics:: for pls functions to avoid clash with other packages

 

Why does a prediction distance matter? (full story in our manuscript)

The supervised multivariate methods in mixOmics can be applied on an external test set to predict the outcome of new samples with the predict function (predict), or to assess the performance of the statistical model (perf). The predict function calculates prediction scores for each new sample, or predicted coordinates, which are equivalent to the latent component scores in the training set.

Prediction distances. Our supervised models work with dummy indicator matrices Y to indicate the class membership of each sample, and result in a prediction score for each outcome category k, k = 1, . . . , K. Therefore, the scores across all classes K need to be combined to obtain the final prediction of a given test sample using a prediction distance. We propose distances such as ‘maximum distance’, ‘Mahalanobis distance’ and ‘Centroids distance’, as detailed our supplemental information and in ?predict. Those distance can give different predictions, which will be assessed in the performance of the model.

 

Patch 6.1.2 and some updates

R CRAN update

The new patch version of mixOmics is on CRAN. It includes a few bug fixes raised by our users (thank you!) and a few improvements. Florian Rohart has been fiddling really hard with ggplot2 to make a new plotIndiv version that can beautifully handle two legends!

plotIndiv example with two legends in 6.1.2

 

# indicate the group, treatment and pch for each sample
my.group
 [1] "group 1" "group 1" "group 2" "group 2" "group 3" "group 3" "group 4" "group 4" "group 1" "group 1" "group 2" "group 2" "group 3" "group 3" "group 4" "group 4"
[17] "group 1" "group 1" "group 2" "group 3" "group 3" "group 4" ....
my.treatment
 [1] "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" ....
my.pch.trt
 [1] 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 16 16 ....

plotIndiv(pca.res, ind.names = F, title = 'PCA', legend = TRUE, 
 # legend 1 colors setting:
 group = my.group, col.per.group = color.per.group, legend.title = 'Groups', 
 # pch setting:
 legend.title.pch = 'Treatment', pch = my.pch.trt, pch.levels = my.treatment) 

Here is a list of the major bug fixes and improvements for 6.1.2:

New features:
————-
1 – tune.splsda now returns a ‘choice.ncomp’ which indicates the number of components to choose (only if nrepeat > 2, criterion based on t-tests)
2 – plotIndiv now enables two legends based on color, as well as pch, when pch is a factor different from what is indicated in group (use arguments pch and pch.levels, see ?plotIndiv)

Enhancements:
————-
1 – argument ‘cutoff’ now replaces ‘threshold’ in network for consistency with plotVar and circosPlot
2 – new argument ‘sd’ in plot.perf for block.splsda method
3 – new arguments “color.Y” and “color.blocks” in cimDiablo
4 – new argument ‘xlim’ in plotLoadings

Bug fixes:
———-
– directionality is now enforced in AUROC (results lower than 0.5 can be obtained, which would indicate a very poor model performance)

Manuscripts:

The MINT paper is out:

  • Rohart F.,  Matigian N., Eslami A., Bougeard S and Lê Cao, K. A.MINT: A multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. Now available on bioRxiv! in press in BMC Bioinformatics 18:128.

The mixOmics manuscript (first draft) is on bioRxiv, with sweave codes:

Patch 6.1.1

Dear mixOmics users,

We have a new patch version 6.1.1 available from the CRAN to fix a few bugs by our team or mixOmics users (thank you!) and few enhancements and updates to follow ggplot2 updates.

For those using DIABLO, please note points 8 & 9 as we changed the default parameters for a scheme = ‘horst’ instead of ‘centroid’ and  init = ‘svd.single’ instead of ‘svd’ in the methods, as we feel it was more appropriate. That may change your results compared to last version and you may want to use the old parameters instead.

New features:
1 – mint.pca function to perform unsupervised integration of independent data sets
2 – new weighted prediction for block approaches for both unsupervised and supervised analyses, see ?predict.spls and ?predict.splsda.
3 – ‘cpus’ parameter for sPLS-DA perf/tune and block.splsda perf/tune added to run the code in parallel

Enhancements:
4 – ‘constraint’ parameter for sPLS-DA perf and tune functions added.
5 – plotLoading for PCA object
6 – color argument in plot.tune and plot.perf added

Bug fixes:
7- predict with logratio (the logratio transform is now performed inside the predict function)
8- in block methods, scheme = ‘horst’ set by default instead of centroid
9- in block methods, initialisation set to svd.single by default

Thank you again for using mixOmics.

Software requirements for mixOmics workshops

We list below some installation requirements to ensure the mixOmics workshop will run smoothly for everyone.

Important reminders. We expect the trainees to have a good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) to be able to fully enjoy the workshop. Attendees are requested to bring their own laptop as this is a hands-on workshop (we will alternate theory and practice).

Software installation and updates. To run the R scripts in this workshop, you will need to install or update the latest versions of R available from the CRAN (currently > 3.4, see also Installation guide for R and RStudio), followed by the update or installation of the following R packages:

  • mixOmics version 6.3.1 (the version number is important)
  • mvtnorm
  • corrplot
  • igraph

The mixOmics package should directly import the following packages: igraph, rgl, ellipse, corpcor, RColorBrewer, plyr, parallel, dplyr, tidyr, reshape2, methods , matrixStats , rARPACK, gridExtra .

Check after install that the following does not throw any error*:

library(mixOmics)

We also advise to use the software RStudio

*For apple mac users, if you are unable to install the mixOmics imported library rgl, you will need to install the XQuartz software first .

Wifi will be available on site, but it is preferable that you make those installations before the workshop to avoid delays for the analyses.

Any question regarding the requirements and software installation: email us at mixomics[at]math.univ-toulouse.fr

12-14 Sept 2016, Toulouse, FR (COST)

Our workshop in Toulouse (3-day) was sponsored by EU COST Action “The quest for tolerant varieties: phenotyping at plant and cellular level (FA1306). and organised by GenoToul Biostat platform, Laboratory of Plant-microbe Interactions (LIPM) and Plant Science Research Laboratory (LRSV). We trained and coached 26 participants and had a great time during the third day (‘byo’ data) and the ice breaking gala dinner!

 

mixomics-summer-school_09-2016-crop
Participants, organisers and tutors
20160913_143319
Looking very studious! We were hosted by the LIPM lab, INRA Auzeville Toulouse

Some feedback from our participants:

Overall I did enjoy the workshop, it was one of the most interesting and well put together that I have attended. Thank you very much.

The tutorials on the website are excellent for training.

It was a very good mixture of theory and practice to directly try out the methods. Also there were many experts who where available for questions. The presentations were quite clear to me as well as the course material and the provided scripts.

‘[Day 3] was useful, because it allows to check if we have well understood the use of each analysis, and bring our own data allows to make these analysis more concrete.’

[…] I could discuss with some other participants with similar experimental design and see how they think [they can] apply mixOmics

 

Data for Day 3 available:

Draught response in sunflower data with Get_started script (knitr format, open the .Rmd file with RStudio), with slides from David.

Some useful references discussed during the workshop:

Liu et al 2015: we used Principal Component Curves (a variant of PCA, but where you fit a curve, and where you need a ‘reference’ group) to quantify pathway regulation of Homologous Recombination in breast cancer.

Singh et al. 2016 (bioRxiv): the asthma study (#2) summarised some of the omics data sets into gene modules to quantify pathways before the integration step. This is the DIABLO paper.

Straube et al 2015: the linear mixed model framework to reduce the dimension of time course data from (n x p x T) to (T x p), lmms is available on CRAN.

Straube et al 2016: Dynomics to detect delay between time course data. Submitted.

Rengel et al. 2012 paper fr the drought response in sunflower.

Wickham 2014: tidy data


 

Version 6.1.0 and latest publications

We are proud to announce our new update 6.1.0 available on CRAN. It was supposed to be a small patch but we got slightly ahead of ourselves. Special thanks to the mixOmics French’Oz developers, Dr Florian Rohart (University of Queensland, Brisbane) and Mr François Bartolo (Université de Toulouse, France), as well as several users who have been using our latest methods and reported bugs or suggested improvements on our bitbucket issue website.

Manuscripts and publication update

  • Rohart F.,  Matigian N., Eslami A., Bougeard S and Lê Cao, K. A..MINT: A multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. Now available on bioRxiv!

  • Singh A, Gautier B, Shannon C, Vacher M, Rohart F, Tebbutt S, K-A. Lê Cao. DIABLO – multi-omics data integration for biomarker discovery. Manuscript available in bioRxiv.

  • K-A. Lê Cao*, ME Costello*,  VA Lakis, F Bartolo, XY Chua, R Brazeilles, P Rondeau. (2016) MixMC: Multivariate insights into Microbial Communities.PLoS ONE 11(8): e0160169 [link]

List of changes in mixOmics 6.1.0 (in NEWS file)

In short,
– cimDIABLO argument ‘corThreshold’ replaced by ‘cutoff’
– new plots of tune and perf results now available
– tune function for block.splsda/DIABLO method
– auroc for supervised methods

New features:

1- auroc function applicable for (mint).(block).(s)plsda objects. AUc values also included in perf and tune functions (except mixDIABLO module)
2- tune.block.splsda function to chose the keepX parameters of block.splsda (a.k.a mixDIABLO)
3- plot for perf objects displays the classification error rate w.r.t components
4- plot for tune objects displays the classification error rate w.r.t keepX values (not implemented for tune.block.splsda)
5- multilevel function has been removed (as planned) as it is now included as an argument in other functions (see pca, pls, splsda, etc)

Enhancements:
1 – All tune functions (except for mixDIABLO/block.splsda module) include a ‘constraint’ argument to either build the model based on user input specific parameters (object$keepX.constraint) or based on the optimal parameter keepX determined by the tune function, see examples in help files.
2 – All perf functions (except for mixDIABLO/block.splsda module) have now a ‘constraint’ argument that allows the performances to be calculate either based on the number of parameters (object$keepX) defined in object or based on the variables selected on each component, see examples in help files.
3 – max.iter has been set to 100 to speed up computational time for all multivariate methods except pca/spca.
4 – cimDiablo: new arguments include transpose, row.names and col.names
5 – circosPlot: new arguments include var.names and comp. Argument ‘corThreshold’ has been replaced by ‘cutoff’.
6 – plotIndiv: new argument legend.title
7 – network function for block.spls(da) models and allows to plot for more than 2 blocks
8 – PCA: new argument ilr.offset to be used only for ILR log transform in PCA (mixMC module)
9 – Legend added in plotDiablo, new argument legend.ncol

Bug fixes:
1 – plotIndiv and ellipse: plot ellipse for all groups with more than 1 sample
2 – predict function: argument multilevel added, log transform included
3 – Call to plsda.vip() from the RVAideMemoire package
4 – other small bugs as listed in out bitbucket issues, matching rgl package changes.