Update – mixOmics

We are moving …. to bioC!

Dear all,

After 9 years hosted at the R CRAN we are migrating to bioconductor! It’s been a great first journey and we are grateful to the R CRAN for hosting our package. We are now ready for the next adventure.

Why are we moving?

It is our aspiration to empower computational and molecular biologists, which aligns with bioC vision.
We will be able to link with new experimentClass S4 objects and existing data packages using them in bioC, ranging from multi omics, microbiome and single cell.
We will be able to provide better vignettes and examples that will complement our website.

What has changed? What should I do? Should I panic?

So far we have allowed as little disruptions as possible, so the call of the functions and objects are the same. Gradually we will be adding more capabilities, which will grandly improve your usability (see above for the S4 class).

We are almost on bioC but the full acceptance is pending on the removal of mixOmics on the R CRAN. We fixed a few bugs, if you would like to install this new version:

The development version is now accessible on gitHub (feel free to fork / help* / comment on gitHub):

R>install_github("mixOmicsTeam/mixOmics")

Or alternatively, once we will be in bioConductor:

R> if (!requireNamespace("BiocManager", quietly = TRUE))  install.packages("BiocManager")
R> BiocManager::install("mixOmics", version = "3.8")

Then, business as usual!

* We would like to formally acknowledge the help of Lluís Revilla (Centre Esther Koplowitz, Barcelona) for helping us with setting up some testthat checks for our bioC version.

As we enter this new journey, we also thank you for this.
And also for this!

PS: a one-day microbiome workshop is scheduled in chilly Vancouver on November 6.

6.2.0, 2 postdoc positions and workshops

Dear mixOmics users,

Our new update 6.2.0 is now available on CRAN as part of our new version of our manuscript.

manuscript & package update:

The mixOmics manuscript introducing the supervised and integrative frameworks (PLS-DA, DIABLO block.plsda and MINT) has be updated, along with all the R / Sweave case studies, manuscript and codes are available at this link. The case studies are also published on our website (sPLSDA:SRBCT, Case study: TCGA and Case study: MINT).

The manuscript describes in more details the difference prediction distances (see also the supplemental material) and the interpretation of the AUROC for our supervised methods.

The constraint argument was removed from all our methods, due to a risk of overfitting.

New features:

– The constraint argument (version 6.1.0 – 6.1.3) was removed in the functions perf and tune for all supervised objects because of a risk of overfitting

Enhancements:

– AUROC aded for MINT objects mint.plsda and mint.splsda where the study name needs to be specified, e.g. auroc( .., roc.study = “study4”). See ?auroc

– choice.ncomp output added on all perf and tune functions for all supervised methods.

– mat.c output for pls and plsda objects (matrix of coefficients from the regression of X / residual matrices X on the X-variates).

Bug fixes (thank you to the users who notified us on bitbucket):

– fixed bug when using predict, perf or tune with the error msg: ‘Error in predict.spls(spls.res, X.test[, nzv]) : ‘newdata’ must include all the variables of ‘object$X”

Workshops:

We advertised two workshops at this link. The advanced workshop 23-24 Oct 2017 is fully subscribed. This is our first MAW (mixOmics advanced workshop), but there will be more planned in 2018. We still have a few spots left for the classic workshop on the 9-10 Nov 2017 in Toulouse, contact us for more information (priority will be given to students and early career researchers).

Two senior postdoc positions (2 year and 3 year) still open!

The Australian mixOmics team now based at the University of Melbourne is recruiting two senior postdocs in the fields of computational biology or statistics, 1 full time 2-year position to work with the Stemformatics team on exciting omics integrating problems (‘omics and single cell omics) to improve stem cell classification, and 1 full time 3-year position for innovative multivariate methods developments for ‘omics time course, microbiome and P-integration. Contact us for more information.

Website update:

With the invaluable help from the bioinformatics masters students Danielle Davenport and Zoe Welham we are currently revamping the website to ensure all codes are running correctly. Thank you for those who sent us some feedback!

Update 6.1.3 on CRAN, postdoc position, manuscript and upcoming workshops

Dear mixOmics users,

We have been quiet for a while, but we have some good news! A CRAN update, a manuscript in bioRxiv, a 3-year postdoc position open to be part of the mixOmics core team, and three workshops planned for the French autumn!

The 6.1.3 update is now on the CRAN, we fixed a few bugs (see list below), and we also have a new plotIndiv argument ‘background‘ to visualise the prediction area for a PLS-DA and sPLS-DA model (max 2 components). This is a powerful plot to visualise the effect of the different prediction methods. Why does a prediction method matters for the performance of the discriminant analysis models? See elements of information below.

Example of prediction area plot for the SRBCT data with a PLS-DA model, see ?srbct

All you need is the background.predict function, and overlay the results with plotIndiv. For example:

data(liver.toxicity)
X = liver.toxicity$gene
Y = as.factor(liver.toxicity$treatment[, 4])
plsda.liver = plsda(X, Y, ncomp = 2)

# calculating background for the two first components, and the mahalanobis distance
background = background.predict(plsda.liver, comp.predicted = 2, dist = "mahalanobis.dist")

plotIndiv(plsda.liver, background = background, legend = TRUE)

We also added the new functions get.confusion_matrix and get.BER to calculate a confusion matrix based on class prediction of test samples and their real class, and calculate their Balanced Error Rate, see ?get.BER. Example of outputs (for a DIABLO analysis on the breast cancer TCGA multi omics study):

Example from our DIABLO pipeline available at https://mixomics.org/wp-content/uploads/2012/03/mixOmicsRscripts.zip

We have submitted a new version of our mixOmics manuscript to bioRxiv! The manuscript is available at this link and has been a top tweeted story in #bioinformatics. The manuscript mostly summarises the latest mixOmics frameworks for Discriminant Analysis (sPLS-DA, DIABLO and MINT) with extensive R and Sweave codes here, give it a go! The supplemental thoroughly details these methods. It almost sounds like an end of a first mixOmics era as Florian, our very talented and dedicated core developer, debugger and developer of MINT has moved on for another postdoctoral position at the University of Queensland, and Kim-Anh is starting her new group as a Senior Lecturer position at the University of Melbourne (UoM), at the Centre for Systems Genomics. Do not fear, this means there will be a new round of developments, notably in the microbiome and metagenomics field, as we are opening a new 3-year senior postdoctoral position in Computational Biostatistics at UoM (with opportunity to teach at the School of Mathematics and Statistics). More details at this link.

Seventeen multivariate methods currently implemented in mixOmics! Can you recognise your favourite?

Three workshops are coming up, between Sept – Nov 2017 in France. The first edition of MAW’17 is the advanced mixOmics workshop to introduce our new frameworks (published and in development: DIABLO, MINT, SNPOmics, timeOmics, mixMC and extension of integration) to our advanced users. The workshop is free, but you will need to cover your own travel and accommodation costs. Toulouse, 23-24 Oct 2017. Send us an email and we can send you the details. The two other workshops will be our normal beginner mixOmics workshops, in September (Lille) and in early November (Toulouse). More details on our website soon.

Other enhancements and bug fixes:

Enhancements:
————-
1 – perf.sgccda (for DIABLO) now implements a constraint model (see details in ?perf)
2 – legend = TRUE option in circosPlot and plotDiablo
Bug fixes:
———-
– tune.splsda had a bug when assessing the ‘choice.ncomp’ based on ones-sided t-test of the error rate when the error rate was constant.
– sparse PCA deflation algorithm fixed
– added add mixOmics:: for pls functions to avoid clash with other packages

Why does a prediction distance matter? (full story in our manuscript)

The supervised multivariate methods in mixOmics can be applied on an external test set to predict the outcome of new samples with the predict function (predict), or to assess the performance of the statistical model (perf). The predict function calculates prediction scores for each new sample, or predicted coordinates, which are equivalent to the latent component scores in the training set.

Prediction distances. Our supervised models work with dummy indicator matrices Y to indicate the class membership of each sample, and result in a prediction score for each outcome category k, k = 1, . . . , K. Therefore, the scores across all classes K need to be combined to obtain the final prediction of a given test sample using a prediction distance. We propose distances such as ‘maximum distance’, ‘Mahalanobis distance’ and ‘Centroids distance’, as detailed our supplemental information and in ?predict. Those distance can give different predictions, which will be assessed in the performance of the model.

Version 6.0.0

Dear mixOmics users,

It is with a huge relief and pride (and maybe some slight anticipatory anxiety of that very moment) that we announce the release of mixOmics_6_0_0 on CRAN. We are introducing three novel frameworks, mixMC, mixMINT and mixDIABLO, which are described (as best as we can, given the free remaining time we have on our hands not debugging) on the website. All manuscripts are in submission / revision so feel free to ask.

A special thanks to those who made that update possible, in particular Florian Rohart and Benoit Gautier, and the whole Lê Cao lab troop for the numerous layers of testing. We tested as much as we could but of course all data are different. Do no hesitate to report bugs or comments at mixomics[at]math.univ-toulouse.fr or on our bitbuket issue list.

Members of the mixOmics team will be present at the following summer conferences in the nothern hemisphere, feel free to say hello!
Rencontres R 2016: Toulouse, France, June 22-24, presentation on mixMINT
JOBIM 2016: Lyon, France, June 28-30, presentation on mixDIABLO
ISMB / ISCB 2016: Orlando, Florida, July 8 – July 12, attendance and presentation to the SBV crowd verification challenge, using our cousin package bootPLS
JSM 2016: Chicago, Illinois, July 30 – Aug 4, presentation on mixMC
INPPO 2016: Bratislava, Slovaquia, Sept 4-8, keynote

Below is the list of changes in the package. Please note the few argument names changes for some of the plots.

Changes in 6.0.0 (major, implementation improvements and new methods)

In short,

– argument names which changed in all plots for homogeneous call are: ‘main’ changed to ‘title’, ‘add.legend’ -> ‘legend’, ‘cex.xxx’ -> ‘size.xxx’, ‘plot.ellipse’ -> ‘ellipse’

– ncomp is now a single value in all wrapper. and block. functions (multiple integration)

Please refer to our help files for the functions listed below.

New features:

1- log.ratio transformation (log.ratio = c(‘CLR’, ‘ILR’)) in PCA and PLS-like methods to deal with compositional microbiome data (see website www.mixOmics.org/mixMC for details)

2 – plotLoadings is a novel graphical way of showing the regression coefficients of the selected variables (deprecated plotContrib)

3 – mixMINT module to analyse independent data sets on the same type of variable. See www.mixOmics.org/mixMINT for details.

Added methods: mint.pls, mint.plsda, mint.spls, mint.splsda;

S3 visualisations: plotIndiv, plotLoadings, plotVar;

Performance evaluation: perf (new, uses leave one out group), tune (new, uses leave one out group)

4 – mixDIABLO module to integrate different omics data sets performed on the same samples. See www.mixOmics.org/mixDIABLO for details.

Added methods: block.pls, block.plsda, block.spls, block.splsda;

S3 visualisations: circosPlot (new), cimDiablo (new), plotDiablo (new), plotIndiv, plotLoadings, plotVar;

Performance evaluation: perf, tune, predict (new with majority vote for DIABLO, $vote)

5 – new data sets: stemcell (for MINT), TCGA.breast.cancer (for DIABLO)

Enhancements:

1 – plotIndiv: displays explained variance for sPLS objects

2 – multilevel option is now included in PLS and PCA objects (argument multilevel = design or sample information)

3 – WARNING: in all plots, homogeneous arguments call: ‘main’ changed to ‘title’, ‘add.legend’ -> ‘legend’, ‘cex.xxx’ -> ‘size.xxx’, ‘plot.ellipse’ -> ‘ellipse’

4 – print.method functions updated to show the range of graphics / other functions to use with the object

5 – predict function now outputs class names in $class

6 – data set vac18 reduced number of genes is now 100 genes

7 – plotContrib has been depraceted for plotLoadings

8 – ncomp input is now a single value in wrapper.rgcca, wrapper.sgcca, block.pls, block.spls, block.plsda, block.splsda

Bug fixes:

1 – explained variance for NIPALS/PCA fixed

2 – plot3d mistmatch legend color, double titles for plotIndiv ggplot2 and lattice, order of group for ggpot2 and lattice

3 – retired: data set prostate

mixOmics for 2016

Well well, 2016 is well under way and we thought we could give you some heads up as of what is happening next for mixOmics.

2015 has been great for us:

We ran a total of 5 x 12-day mixOmics workshops (see list below, in Auckland NZ, Birsbane AUS, Paris, Montpellier and Toulouse, FR),
We launched our first shiny web-interface for sPLS-DA, which has been developed for our published paper (PCT patent ‘Blood Test for Throat Cancer’ PCT/AU2015/050723 on the biological findings of those interesting biomarkers). The shiny web-interface is still at its infancy, as we can only have one user at a time (shiny requirements!), and so if the interface goes grey, it means that someone else is using it!
Francois Bartolo, one mixOmics key developer from Toulouse, came to Brisbane for a 3-month visit and gave a good stab to most of the graphical functions (plotIndiv, network, CIM…)
Benoit Gautier, our key mixOmics developer based in Brisbane developed the shiny web-interface and set up the new sGCCA functions (integration of multiple data sets)
Benoit and Florian Rohart (also key mixOmics developer based in Brisbane) also worked together to push mixOmics V6.
We also made a good stab at our multivariate analysis pipeline for 16S microbial data, with a first unpublished workflow available here and a preprint available soon.

What’s planned for 2016?

More workshops! So far 3 are planned (those will be announced on our website)
We will clone few shiny web-interfaces on our virtual machine to enhance this tool.
mixOmics V6 is in the backlogs, with a planned update for end of April 2016 (stay tuned!) and we will (finally) push a proper mixOmics software manuscript.
We are in the process of reorganising a few workflows in mixOmics with:
- mixMC: mixOmics for Microbial Communities (16S data)
- mixDIABLO: a framework for Data Integration Analysis for Biomarker discovery using Latent variable approaches for multi-Omics studies (check out that acronym!)
- mixMINT: mixOmics for Multi-group INTegrative studies to combine independent single ‘omics studies.

In short, there will be more functionalities for mixOmics users but it should not change the calls of the main functions and we are wrapping up the statistical developments that kept us busy in the last couple of years.

To be aware of our latest developments, please sign to our mailing list.

List of 2015 workshops:

Oct 24-25 2015 (2 days) AgroParisTech, Paris, France. #attendees: 22
Sept 15-16 2015 (2 days) National Institute for Agricultural Research, Toulouse, France. #attendees: 16
Sept 10-11 2015 (2 days) CIRAD, Agriculture research for development, Montpellier, France. #attendees: 25 (full)
13-14 August 2015 (2 days) Translational Research Institute, Brisbane Australia. #attendees: 32 (full)
April 9-10 2015 (2 days) University of Auckland, New Zealand. #attendees: 40 (full)

mixOmics 5.2.0 (graphical improvements)

6a010534b1db25970b01bb0794c2fc970d-800wi — The reality of R packages development. From http://www.r-bloggers.com/introducing-the-reproducible-r-toolkit-and-the-checkpoint-package/

We are proud to introduce a new mixOmics update dedicated mainly to improvements in graphical outputs. The changes are listed below, please note the change of arguments names (promise, we’ll try not do that again). More posts to come about the new functionalities.

We are particularly grateful to our key contributors Mr Francois Bartolo (Université de Toulouse, who is doing a short stay down here in Brisbane) and Dr Florian Rohart (University of Queensland) for doing such a great job with the development, debugging and testing. If we have missed something please let us know!

New features:
————-
1 – plotArrow for PLS, sPLS, rCC, rGCCA, sGCCA, sGCCDA is an improved version from our old s.match function (which is still available but will be soon deprecated)
2 – network function has been enhanced with various options to represent the nodes (e.g. lty.edge=’dotted’,row.names = FALSE), see our website for more examples
2 – rcc has a new argument method = c(“ridge”, “shrinkage”) with shrinkage to estimate the shrinkage coefficients directly
3 – plotIndiv directly implements 3d plots (style=’3d’), including ellipses, % of variance explained output for PCA, centroids and star plots (see example(plotIndiv))
4 – plotVar directly implements 3d plots (style=’3d’), legend can also be added with add.legend = TRUE
5 – cim and network have new arguments: save = c(‘jpeg’,’tiff’,’png’,’pdf’) to save plots directly, and name.save. Argument threshold has been added/updated for both displays. Some arguments underwent name changes, see ?network

Enhancements:
————-
1 – network: a single function for all objects.
2 – pheatmap.multilevel has been deprecated with the new enhancements of CIM
3 – plot3dIndiv and plot3dVar have been deprecated (see new features in plotIndiv and plotVar)
4 – plotContrib also now available for sgccda plsda, splsda objects. Added arguments coplete.name.var and col.ties (see ?plotContrib), changed argument name ties to show.ties
5 – imageMap has been deprecated (now included in cim directly)
6 – pca also outputs ‘loadings’ and ‘variates’ to remain in the mixOmics spirit
7 – tau.estimate help file removed as now directly called as internal function from rcc and srgcca
8 – imgCor: added argument ‘main’ and changed argument names x.sideColors and y.sideColors to sideColors
9 – cim: changed argument names labRow and labCol to row.sideColors and col.sideColors

Bug fixes:
———-
1 – plotContrib now fixed (showed wrong contribution colors)
2 – cim has been fixed to show the ordered variable names after users reports (thanks!)
3 – resolved blank page in network when saving image as a pdf

Patch version 5.1.2

Following our recent Brisbane workshop, and to prepare the upcoming workshops, we have submitted a patch version 5.1.2 to the CRAN to add the argument ‘col‘ to the function plotIndiv.

See also the help file for plotVar, which has a new ggplot2 layout!

Changes in 5.1.2 (patches)
================

Enhancements:
————-
1 – plotIndiv: the argument col is back! see our help file.

2 – plotVar has been dramatically improved with more efficient coding (not a S3method anymore) and availability of different plotting styles with ‘ggplot2’, ‘lattice’ or ‘graphics’.
Bug fixes:
———-
– plotIndiv: X and Y.label fixed, par() bug fixed

-rgcca tau parameter output enhanced.

Update on CRAN 5.1.1 Major changes

In the last few months we have been busy with our major update. This is quite a major release with additional new features.

One major change that will impact all of us is the function plotIndiv. While we have new (sexy) functionalities, the argument ‘col‘ was swapped to ‘group‘. We will see if we can patch it back in the next release (in a month). In the meantime, give it a try, because it is worth the trouble!

We also fixed a convergence issue in the main sparse PLS algorithm. This may slightly affect your end feature selections as the algorithm is now converging properly.

We list the changes below, enjoy!

New features:
1 – plotContrib for objects of class PLSDA and sPLSDA has been added and is of particular interest for those analysing microbial communities / metagenomics data.

2 – wrapper.sgccda was added to enable multiple data sets integration with one or several factor outcomes. Note: the prediction function for this new add-on has not been fully tested yet and is not available.

3 – wrapper.sgcca and wrapper.sgccda now have an argument called ‘keep‘ that you can use as an alternative to the ‘penalty’ old argument. Keep is the equivalent of the keepX in the PLS method to specify the number of variables to select on each component and each block. Refer to the help file, as keep should be input as a list of length the number of blocks, and each element of the list (corresponding to a block) indicates the number of variables to select on each component (yes, it becomes, indeed, complicated).

4 – All wrapper methods for the multiblock module, i.e. wrapper.rgcca, wrapper.sgcca and wrapper.sgccda take the input argument ‘blocks‘ (instead of previously ‘data‘) – this is to enable a smoother transition to the next update!

5 – plotIndiv has been improved dramatically. A single function can now be used for the objects PLS, sPLS, PLS-DA, SPLS-DA, rCC, PCA, sPCA, IPCA, sIPCA, rGCCA, sGCCA, sGCCDA (not an S3 function anymore). In addition, we now provide the new arguments (and more to come!):
– ellipse plots are now available, a group argument is requested for the unsupervised methods (PCA, IPCA, PLS)
– three types of graphical plot: graphics (version < 5.1-0), ggplot2 and lattice
– legend and title can be added
– NOTE: if you want to color each sample with respect to a factor (i.e. a factor of length n), then the argument to use is ‘group’. If you use a supervised approach then col.per.group is a vector of length the number of groups. These arguments may change in the coming up updates.

6 – cim has been implemented for PLS, sPLS, PLS-DA, SPLS-DA, rCC, PCA, sPCA, IPCA, sIPCA and includes a wide range of options to plot a single data set in the form of a heatmap (new!), or the cross correlation between two matching data sets via the methods rCC or (s)PLS using the cross product between latent variables and loading vectors (improved with legends and color bars). We will give more examples on our website.

7 – added package dependencies: ggplot2 and ellipse

Enhancements:
1 – All wrappers for multiple data integration have been improved and re-implemented. Consequently, the dependency to RGCCA has been removed, and three wrapper functions are now available: wrapper.sgcca, wrapper.rgcca and wrapper.sgccda (see New Feature #2 above).

2 – selectVar has been extended for the non sparse versions PCA, PLS and PLS-DA and output the features with decreasing absolute weights in the loading vectors. It is used in particular for plotContrib (see New feature #1 above)

Bug fixes:
1 – The sPLS algorithm was rewritten to ensure convergence. This implies that spls results might be slightly different from version < 5.1-0!

mixOmics 5.0-4 on CRAN

Dear mixOmics users,

We have submitted an updated version to the CRAN. The changes are listed below. Few points in particular to keep in mind:

select.var() was renamed selectVar() (clash with our dependency to the package MASS)
we borrowed the function tau.estim() to the RGCCA package in order to estimate the regularisation parameters from the rCCA – a way to bypass tune.rcc() with large matrices
the multilevel module has been updated, with some changes in the call of the function and a new function called withinVariation() (see details on the website https://mixomics.org/methods/multilevel/)

We thank you all for your interest in the package. There are important upcoming developments so please keep in touch via the website.

Changes in 5.0-4
================

New features:
————-
1- new set of palettes have been added: color.jet, color.spectral, color.GreenRed and color.mixo
2- the multilevel module has been updated. A new function called withinVariation() calculates the within matrix. Our new website www.mixOmics.org will be updated shortly
3- the function tau.estim was borrowed from the RGCCA package and included in mixOmics in order to estimate the regularisation parameters from rcc more efficiently than tune.rcc(). We noted differences in those parameters estimates between tune.rcc() and tau.estim() as the methods use either cross-validation or the formula from Shaefer and Strimmer (2005). When using tau.estim() we also advise to center and scale the input data in rcc(). See help tau.estim().
4- because of a S3 method clash with the MASS package with the current R version we had to rename select.var to selectVar

Bug fixes:
———-
1- select.var.sgcca has been fixed (the outputs were messy)
2- minor bug in plotVar.sgcca and plotVar.rgcca fixed
3- the algorithm in perf.pls and perf.spls has been almost entirely changed. We are now using a different algorithm to estimate the Q2, as presented in the help Rd file (unfortunately the reference is in French so contact us for more details if needed). plot.perf() has been updated

Enhancements:
———-
1- network default color set to color.GreenRed
2- output feature.final in perf S3 function has been removed. Better to use select.var() to obtain the list of selected variables
3- the multilevel module has been updated. The argument names were changed to ‘design’ instead of ‘cond’. The pheatmap.multilevel() function has been improved.
4- the nearZeroVar function that was borrowed from the caret package has been enhanced to improve computational time as this is costly in the pls/spls functions

mixOmics 5.0-2 update

The major changes of this new update is the perf() function that supersedes valid() and offers a variable stability measure across the different folds.

The pls() and spls() functions have been modified and are now following the same framework coding.

See the CRAN page here.

The mixOmics website will be updated shortly for the major changes of these functions. Remember that you can subscribe to our newsletter (mixOmics updates, workshops) as indicated here.

Changes in 5.0-2

New features:
————-
– The valid function has been superseded by the perf function. Although similar in essence, few bugs have been fixed to estimate the performance of the sPLS and sPLS-DA models with no selection bias. A variable stability frequency has been added to the output. Functions spls.model and pls.model have been removed.

Bug fixes:
———-
-pls and spls function have been modified and ‘harmonised’ w.r.t to scaling. Loading vectors a and b are now scaled to 1. Latent variables t and u are not scaled (following Table 21 of the Tenenhaus book – which is in French, sorry!).

-the argument abline.line has been set to FALSE by default in all plotIndiv functions.

-the warnings messages in the plot functions have been fixed

– tune.multilevel for one factor has been fixed.