Update 6.1.3 on CRAN, postdoc position, manuscript and upcoming workshops

Dear mixOmics users,

We have been quiet for a while, but we have some good news! A CRAN update, a manuscript in bioRxiv, a 3-year postdoc position open to be part of the mixOmics core team, and three workshops planned for the French autumn!

The 6.1.3 update is now on the CRAN, we fixed a few bugs (see list below), and we also have a new plotIndiv argument ‘background‘ to visualise the prediction area for a PLS-DA and sPLS-DA model (max 2 components). This is a powerful plot to visualise the effect of the different prediction methods. Why does a prediction method matters for the performance of the discriminant analysis models? See elements of information below.

Example of prediction area plot for the SRBCT data with a PLS-DA model, see ?srbct

All you need is the background.predict function, and overlay the results with plotIndiv. For example:

data(liver.toxicity)
X = liver.toxicity$gene
Y = as.factor(liver.toxicity$treatment[, 4])
plsda.liver = plsda(X, Y, ncomp = 2)

# calculating background for the two first components, and the mahalanobis distance
background = background.predict(plsda.liver, comp.predicted = 2, dist = "mahalanobis.dist")

plotIndiv(plsda.liver, background = background, legend = TRUE)

We also added the new functions get.confusion_matrix and get.BER to calculate a confusion matrix based on class prediction of test samples and their real class, and calculate their Balanced Error Rate, see ?get.BER. Example of outputs (for a DIABLO analysis on the breast cancer TCGA multi omics study):

Example from our DIABLO pipeline available at https://mixomics.org/wp-content/uploads/2012/03/mixOmicsRscripts.zip

 

We have submitted a new version of our mixOmics manuscript to bioRxiv! The manuscript is available at this link and has been a top tweeted story in #bioinformatics. The manuscript mostly summarises the latest mixOmics frameworks for Discriminant Analysis (sPLS-DA, DIABLO and MINT) with extensive R and Sweave codes here, give it a go! The supplemental thoroughly details these methods. It almost sounds like an end of a first mixOmics era as Florian, our very talented and dedicated core developer, debugger and developer of MINT has moved on for another postdoctoral position at the University of Queensland, and Kim-Anh is starting her new group as a Senior Lecturer position at the University of Melbourne (UoM), at the Centre for Systems Genomics. Do not fear, this means there will be a new round of developments, notably in the microbiome and metagenomics field, as we are opening a new 3-year senior postdoctoral position in Computational Biostatistics at UoM (with opportunity to teach at the School of Mathematics and Statistics). More details at this link.

Seventeen multivariate methods currently implemented in mixOmics! Can you recognise your favourite?

Three workshops are coming up, between Sept – Nov 2017 in France. The first edition of MAW’17 is the advanced mixOmics workshop to introduce our new frameworks (published and in development: DIABLO, MINT, SNPOmics, timeOmics, mixMC and extension of integration) to our advanced users. The workshop is free, but you will need to cover your own travel and accommodation costs. Toulouse, 23-24 Oct 2017. Send us an email and we can send you the details. The two other workshops will be our normal beginner mixOmics workshops, in September (Lille) and in early November (Toulouse). More details on our website soon.

 

Other enhancements and bug fixes:

Enhancements:
————-
1 – perf.sgccda (for DIABLO) now implements a constraint model (see details in ?perf)
2 – legend = TRUE option in circosPlot and plotDiablo
Bug fixes:
———-
– tune.splsda had a bug when assessing the ‘choice.ncomp’ based on ones-sided t-test of the error rate when the error rate was constant.
– sparse PCA deflation algorithm fixed
– added add mixOmics:: for pls functions to avoid clash with other packages

 

Why does a prediction distance matter? (full story in our manuscript)

The supervised multivariate methods in mixOmics can be applied on an external test set to predict the outcome of new samples with the predict function (predict), or to assess the performance of the statistical model (perf). The predict function calculates prediction scores for each new sample, or predicted coordinates, which are equivalent to the latent component scores in the training set.

Prediction distances. Our supervised models work with dummy indicator matrices Y to indicate the class membership of each sample, and result in a prediction score for each outcome category k, k = 1, . . . , K. Therefore, the scores across all classes K need to be combined to obtain the final prediction of a given test sample using a prediction distance. We propose distances such as ‘maximum distance’, ‘Mahalanobis distance’ and ‘Centroids distance’, as detailed our supplemental information and in ?predict. Those distance can give different predictions, which will be assessed in the performance of the model.

 

Patch 6.1.1

Dear mixOmics users,

We have a new patch version 6.1.1 available from the CRAN to fix a few bugs by our team or mixOmics users (thank you!) and few enhancements and updates to follow ggplot2 updates.

For those using DIABLO, please note points 8 & 9 as we changed the default parameters for a scheme = ‘horst’ instead of ‘centroid’ and  init = ‘svd.single’ instead of ‘svd’ in the methods, as we feel it was more appropriate. That may change your results compared to last version and you may want to use the old parameters instead.

New features:
1 – mint.pca function to perform unsupervised integration of independent data sets
2 – new weighted prediction for block approaches for both unsupervised and supervised analyses, see ?predict.spls and ?predict.splsda.
3 – ‘cpus’ parameter for sPLS-DA perf/tune and block.splsda perf/tune added to run the code in parallel

Enhancements:
4 – ‘constraint’ parameter for sPLS-DA perf and tune functions added.
5 – plotLoading for PCA object
6 – color argument in plot.tune and plot.perf added

Bug fixes:
7- predict with logratio (the logratio transform is now performed inside the predict function)
8- in block methods, scheme = ‘horst’ set by default instead of centroid
9- in block methods, initialisation set to svd.single by default

Thank you again for using mixOmics.

Version 6.1.0 and latest publications

We are proud to announce our new update 6.1.0 available on CRAN. It was supposed to be a small patch but we got slightly ahead of ourselves. Special thanks to the mixOmics French’Oz developers, Dr Florian Rohart (University of Queensland, Brisbane) and Mr François Bartolo (Université de Toulouse, France), as well as several users who have been using our latest methods and reported bugs or suggested improvements on our bitbucket issue website.

Manuscripts and publication update

  • Rohart F.,  Matigian N., Eslami A., Bougeard S and Lê Cao, K. A..MINT: A multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. Now available on bioRxiv!

  • Singh A, Gautier B, Shannon C, Vacher M, Rohart F, Tebbutt S, K-A. Lê Cao. DIABLO – multi-omics data integration for biomarker discovery. Manuscript available in bioRxiv.

  • K-A. Lê Cao*, ME Costello*,  VA Lakis, F Bartolo, XY Chua, R Brazeilles, P Rondeau. (2016) MixMC: Multivariate insights into Microbial Communities.PLoS ONE 11(8): e0160169 [link]

List of changes in mixOmics 6.1.0 (in NEWS file)

In short,
– cimDIABLO argument ‘corThreshold’ replaced by ‘cutoff’
– new plots of tune and perf results now available
– tune function for block.splsda/DIABLO method
– auroc for supervised methods

New features:

1- auroc function applicable for (mint).(block).(s)plsda objects. AUc values also included in perf and tune functions (except mixDIABLO module)
2- tune.block.splsda function to chose the keepX parameters of block.splsda (a.k.a mixDIABLO)
3- plot for perf objects displays the classification error rate w.r.t components
4- plot for tune objects displays the classification error rate w.r.t keepX values (not implemented for tune.block.splsda)
5- multilevel function has been removed (as planned) as it is now included as an argument in other functions (see pca, pls, splsda, etc)

Enhancements:
1 – All tune functions (except for mixDIABLO/block.splsda module) include a ‘constraint’ argument to either build the model based on user input specific parameters (object$keepX.constraint) or based on the optimal parameter keepX determined by the tune function, see examples in help files.
2 – All perf functions (except for mixDIABLO/block.splsda module) have now a ‘constraint’ argument that allows the performances to be calculate either based on the number of parameters (object$keepX) defined in object or based on the variables selected on each component, see examples in help files.
3 – max.iter has been set to 100 to speed up computational time for all multivariate methods except pca/spca.
4 – cimDiablo: new arguments include transpose, row.names and col.names
5 – circosPlot: new arguments include var.names and comp. Argument ‘corThreshold’ has been replaced by ‘cutoff’.
6 – plotIndiv: new argument legend.title
7 – network function for block.spls(da) models and allows to plot for more than 2 blocks
8 – PCA: new argument ilr.offset to be used only for ILR log transform in PCA (mixMC module)
9 – Legend added in plotDiablo, new argument legend.ncol

Bug fixes:
1 – plotIndiv and ellipse: plot ellipse for all groups with more than 1 sample
2 – predict function: argument multilevel added, log transform included
3 – Call to plsda.vip() from the RVAideMemoire package
4 – other small bugs as listed in out bitbucket issues, matching rgl package changes.

Patch 6.0.1

We are preparing a patch to fix some small bugs we (and other users) noticed since we released version 6.0.0. The .zip (windows) and .tar.gz (linux / mac) can be downloaded from this page. We plan to push a completed patch on the CRAN end of august 2016.

Latest patch update: 18 August

Package to download: mixOmics_6.0.1.zip (windows) or mixOmics_6.0.1.tar.gz (linux, mac)

For the .tar.gz you can install it via RStudio (mac environment) alternatively, type in a terminal (linux environment):

R CMD INSTALL mixOmicsPatch_6.0.1.tar.gz

Then load the patch version in R:

#Load the patch 
library(mixOmics)

Bugs fixed:

Date: 18/08/2016

  • Offset value of 1 added for CLR log transform for mixMC
  • circosPlot variable name fixed for mixDIABLO, new argument size.variables
  • cimDiablo and circosPlot match name to legend color for mixDIABLO, new arguments transpose, row.names and col.names

Date: 03/08/2016

  • Call to plsda.vip() from the RVAideMemoire package
  • Speed up computations for PCA with logratio transformation
  • perf / tune for sPLS-DA with log ratio transformation (that will improve the performance of the model)
  • network function for block.spls models (still in development!)