Updates – Page 2

6.3.1 on CRAN: bug fixes and latest news

We pushed 6.3.1 following a major bug in 6.3.0 when dealing with missing values (especially with DIABLO). Another bug related to the one-sided t-test in the tune functions. All good now. Nipals is also faster to run.

A big thank to the users who give us feedback via our bitbucket issue list, this is very useful to us to continue improving the package.

The 3 workshops we ran in October and November 2017 were a success. The first Advanced workshop resulted in many stimulating discussions that will help the development team to move forward. The two beginner workshops were also a lot of fun. We are particularly pleased to see how the small mixOmics community is growing!

Our paper has finally been published in Plos Computational Biology as a software article. The main methods are described in the poster below. We are now working on the long awaited DIABLO manuscript so that it leaves bioaRxiv and has its life of its own!

In the next few months these are the changes we are planning ahead:

a conversion to bioconductor. Ain’t no fear, it should not affect the function calls. We think it is now the right time to reach the bioconductor community, but that implies a fair amount of implementation on our side. Consequently the methods development will slow down in the coming few months.
a mixOmics forum to encourage discussions around the 19 methods we have now currently available.

Summary of the mixOmics article in PLoS Comp Biol

Version 6.3.0 and workshop

A new CRAN version is now available. We have considerably improved the computational time for the tune and perf functions! (see example below). We also fixed some reproducibility issues when using parallel computing with a set seed.

The update of the package will require new dependencies: ‘matrixStats’, ‘rARPACK’, ‘gridExtra’

There are still some spots left for the beginner mixOmics workshop in Toulouse, 9-10 Nov. Details here.

Enhancements:
————-
– huge gain in computation time for the tune functions tune.splsda and tune.block.splsda. The larger the data, the bigger the gain. Requires new dependencies: ‘matrixStats’, ‘rARPACK’, ‘gridExtra’
– a plot for an object `tune.block.splsda’
– tune.multilevel function was deprecated a while ago and now removed.

Bug fixes:
———-
– fixed reproducibility problem when using parallel coding in tune.block.splsda (via the `cpus’ argument)
– network: correlation with missing values fixed, label names fixed
– fixed perf for block.splsda objects with prediction distances
– some NA issues reported in 6.2.0 fixed (hopefully)

The gain in computational time is reported below for our different supervised frameworks. It all depends on your operating system, but generally, the user time = execution of the code, the system time = system processes (e.g opening and closing files), and the elapsed time is the difference in times since we started the stopwatch.

6.2.0, 2 postdoc positions and workshops

Dear mixOmics users,

Our new update 6.2.0 is now available on CRAN as part of our new version of our manuscript.

manuscript & package update:

The mixOmics manuscript introducing the supervised and integrative frameworks (PLS-DA, DIABLO block.plsda and MINT) has be updated, along with all the R / Sweave case studies, manuscript and codes are available at this link. The case studies are also published on our website (sPLSDA:SRBCT, Case study: TCGA and Case study: MINT).

The manuscript describes in more details the difference prediction distances (see also the supplemental material) and the interpretation of the AUROC for our supervised methods.

The constraint argument was removed from all our methods, due to a risk of overfitting.

New features:

– The constraint argument (version 6.1.0 – 6.1.3) was removed in the functions perf and tune for all supervised objects because of a risk of overfitting

Enhancements:

– AUROC aded for MINT objects mint.plsda and mint.splsda where the study name needs to be specified, e.g. auroc( .., roc.study = “study4”). See ?auroc

– choice.ncomp output added on all perf and tune functions for all supervised methods.

– mat.c output for pls and plsda objects (matrix of coefficients from the regression of X / residual matrices X on the X-variates).

Bug fixes (thank you to the users who notified us on bitbucket):

– fixed bug when using predict, perf or tune with the error msg: ‘Error in predict.spls(spls.res, X.test[, nzv]) : ‘newdata’ must include all the variables of ‘object$X”

Workshops:

We advertised two workshops at this link. The advanced workshop 23-24 Oct 2017 is fully subscribed. This is our first MAW (mixOmics advanced workshop), but there will be more planned in 2018. We still have a few spots left for the classic workshop on the 9-10 Nov 2017 in Toulouse, contact us for more information (priority will be given to students and early career researchers).

Two senior postdoc positions (2 year and 3 year) still open!

The Australian mixOmics team now based at the University of Melbourne is recruiting two senior postdocs in the fields of computational biology or statistics, 1 full time 2-year position to work with the Stemformatics team on exciting omics integrating problems (‘omics and single cell omics) to improve stem cell classification, and 1 full time 3-year position for innovative multivariate methods developments for ‘omics time course, microbiome and P-integration. Contact us for more information.

Website update:

With the invaluable help from the bioinformatics masters students Danielle Davenport and Zoe Welham we are currently revamping the website to ensure all codes are running correctly. Thank you for those who sent us some feedback!

Update 6.1.3 on CRAN, postdoc position, manuscript and upcoming workshops

Dear mixOmics users,

We have been quiet for a while, but we have some good news! A CRAN update, a manuscript in bioRxiv, a 3-year postdoc position open to be part of the mixOmics core team, and three workshops planned for the French autumn!

The 6.1.3 update is now on the CRAN, we fixed a few bugs (see list below), and we also have a new plotIndiv argument ‘background‘ to visualise the prediction area for a PLS-DA and sPLS-DA model (max 2 components). This is a powerful plot to visualise the effect of the different prediction methods. Why does a prediction method matters for the performance of the discriminant analysis models? See elements of information below.

Example of prediction area plot for the SRBCT data with a PLS-DA model, see ?srbct

All you need is the background.predict function, and overlay the results with plotIndiv. For example:

data(liver.toxicity)
X = liver.toxicity$gene
Y = as.factor(liver.toxicity$treatment[, 4])
plsda.liver = plsda(X, Y, ncomp = 2)

# calculating background for the two first components, and the mahalanobis distance
background = background.predict(plsda.liver, comp.predicted = 2, dist = "mahalanobis.dist")

plotIndiv(plsda.liver, background = background, legend = TRUE)

We also added the new functions get.confusion_matrix and get.BER to calculate a confusion matrix based on class prediction of test samples and their real class, and calculate their Balanced Error Rate, see ?get.BER. Example of outputs (for a DIABLO analysis on the breast cancer TCGA multi omics study):

Example from our DIABLO pipeline available at https://mixomics.org/wp-content/uploads/2012/03/mixOmicsRscripts.zip

We have submitted a new version of our mixOmics manuscript to bioRxiv! The manuscript is available at this link and has been a top tweeted story in #bioinformatics. The manuscript mostly summarises the latest mixOmics frameworks for Discriminant Analysis (sPLS-DA, DIABLO and MINT) with extensive R and Sweave codes here, give it a go! The supplemental thoroughly details these methods. It almost sounds like an end of a first mixOmics era as Florian, our very talented and dedicated core developer, debugger and developer of MINT has moved on for another postdoctoral position at the University of Queensland, and Kim-Anh is starting her new group as a Senior Lecturer position at the University of Melbourne (UoM), at the Centre for Systems Genomics. Do not fear, this means there will be a new round of developments, notably in the microbiome and metagenomics field, as we are opening a new 3-year senior postdoctoral position in Computational Biostatistics at UoM (with opportunity to teach at the School of Mathematics and Statistics). More details at this link.

Seventeen multivariate methods currently implemented in mixOmics! Can you recognise your favourite?

Three workshops are coming up, between Sept – Nov 2017 in France. The first edition of MAW’17 is the advanced mixOmics workshop to introduce our new frameworks (published and in development: DIABLO, MINT, SNPOmics, timeOmics, mixMC and extension of integration) to our advanced users. The workshop is free, but you will need to cover your own travel and accommodation costs. Toulouse, 23-24 Oct 2017. Send us an email and we can send you the details. The two other workshops will be our normal beginner mixOmics workshops, in September (Lille) and in early November (Toulouse). More details on our website soon.

Other enhancements and bug fixes:

Enhancements:
————-
1 – perf.sgccda (for DIABLO) now implements a constraint model (see details in ?perf)
2 – legend = TRUE option in circosPlot and plotDiablo
Bug fixes:
———-
– tune.splsda had a bug when assessing the ‘choice.ncomp’ based on ones-sided t-test of the error rate when the error rate was constant.
– sparse PCA deflation algorithm fixed
– added add mixOmics:: for pls functions to avoid clash with other packages

Why does a prediction distance matter? (full story in our manuscript)

The supervised multivariate methods in mixOmics can be applied on an external test set to predict the outcome of new samples with the predict function (predict), or to assess the performance of the statistical model (perf). The predict function calculates prediction scores for each new sample, or predicted coordinates, which are equivalent to the latent component scores in the training set.

Prediction distances. Our supervised models work with dummy indicator matrices Y to indicate the class membership of each sample, and result in a prediction score for each outcome category k, k = 1, . . . , K. Therefore, the scores across all classes K need to be combined to obtain the final prediction of a given test sample using a prediction distance. We propose distances such as ‘maximum distance’, ‘Mahalanobis distance’ and ‘Centroids distance’, as detailed our supplemental information and in ?predict. Those distance can give different predictions, which will be assessed in the performance of the model.

Patch 6.1.1

Dear mixOmics users,

We have a new patch version 6.1.1 available from the CRAN to fix a few bugs by our team or mixOmics users (thank you!) and few enhancements and updates to follow ggplot2 updates.

For those using DIABLO, please note points 8 & 9 as we changed the default parameters for a scheme = ‘horst’ instead of ‘centroid’ and init = ‘svd.single’ instead of ‘svd’ in the methods, as we feel it was more appropriate. That may change your results compared to last version and you may want to use the old parameters instead.

New features:
1 – mint.pca function to perform unsupervised integration of independent data sets
2 – new weighted prediction for block approaches for both unsupervised and supervised analyses, see ?predict.spls and ?predict.splsda.
3 – ‘cpus’ parameter for sPLS-DA perf/tune and block.splsda perf/tune added to run the code in parallel

Enhancements:
4 – ‘constraint’ parameter for sPLS-DA perf and tune functions added.
5 – plotLoading for PCA object
6 – color argument in plot.tune and plot.perf added

Bug fixes:
7- predict with logratio (the logratio transform is now performed inside the predict function)
8- in block methods, scheme = ‘horst’ set by default instead of centroid
9- in block methods, initialisation set to svd.single by default

Thank you again for using mixOmics.

Version 6.1.0 and latest publications

We are proud to announce our new update 6.1.0 available on CRAN. It was supposed to be a small patch but we got slightly ahead of ourselves. Special thanks to the mixOmics French’Oz developers, Dr Florian Rohart (University of Queensland, Brisbane) and Mr François Bartolo (Université de Toulouse, France), as well as several users who have been using our latest methods and reported bugs or suggested improvements on our bitbucket issue website.

Manuscripts and publication update

Rohart F., Matigian N., Eslami A., Bougeard S and Lê Cao, K. A..MINT: A multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. Now available on bioRxiv!
Singh A, Gautier B, Shannon C, Vacher M, Rohart F, Tebbutt S, K-A. Lê Cao. DIABLO – multi-omics data integration for biomarker discovery. Manuscript available in bioRxiv.
K-A. Lê Cao*, ME Costello*, VA Lakis, F Bartolo, XY Chua, R Brazeilles, P Rondeau. (2016) MixMC: Multivariate insights into Microbial Communities.PLoS ONE 11(8): e0160169 [link]

List of changes in mixOmics 6.1.0 (in NEWS file)

In short,
– cimDIABLO argument ‘corThreshold’ replaced by ‘cutoff’
– new plots of tune and perf results now available
– tune function for block.splsda/DIABLO method
– auroc for supervised methods

New features:

1- auroc function applicable for (mint).(block).(s)plsda objects. AUc values also included in perf and tune functions (except mixDIABLO module)
2- tune.block.splsda function to chose the keepX parameters of block.splsda (a.k.a mixDIABLO)
3- plot for perf objects displays the classification error rate w.r.t components
4- plot for tune objects displays the classification error rate w.r.t keepX values (not implemented for tune.block.splsda)
5- multilevel function has been removed (as planned) as it is now included as an argument in other functions (see pca, pls, splsda, etc)

Enhancements:
1 – All tune functions (except for mixDIABLO/block.splsda module) include a ‘constraint’ argument to either build the model based on user input specific parameters (object$keepX.constraint) or based on the optimal parameter keepX determined by the tune function, see examples in help files.
2 – All perf functions (except for mixDIABLO/block.splsda module) have now a ‘constraint’ argument that allows the performances to be calculate either based on the number of parameters (object$keepX) defined in object or based on the variables selected on each component, see examples in help files.
3 – max.iter has been set to 100 to speed up computational time for all multivariate methods except pca/spca.
4 – cimDiablo: new arguments include transpose, row.names and col.names
5 – circosPlot: new arguments include var.names and comp. Argument ‘corThreshold’ has been replaced by ‘cutoff’.
6 – plotIndiv: new argument legend.title
7 – network function for block.spls(da) models and allows to plot for more than 2 blocks
8 – PCA: new argument ilr.offset to be used only for ILR log transform in PCA (mixMC module)
9 – Legend added in plotDiablo, new argument legend.ncol

Bug fixes:
1 – plotIndiv and ellipse: plot ellipse for all groups with more than 1 sample
2 – predict function: argument multilevel added, log transform included
3 – Call to plsda.vip() from the RVAideMemoire package
4 – other small bugs as listed in out bitbucket issues, matching rgl package changes.

Patch 6.0.1

We are preparing a patch to fix some small bugs we (and other users) noticed since we released version 6.0.0. The .zip (windows) and .tar.gz (linux / mac) can be downloaded from this page. We plan to push a completed patch on the CRAN end of august 2016.

Latest patch update: 18 August

Package to download: mixOmics_6.0.1.zip (windows) or mixOmics_6.0.1.tar.gz (linux, mac)

For the .tar.gz you can install it via RStudio (mac environment) alternatively, type in a terminal (linux environment):

R CMD INSTALL mixOmicsPatch_6.0.1.tar.gz

Then load the patch version in R:

#Load the patch 
library(mixOmics)

Bugs fixed:

Date: 18/08/2016

Offset value of 1 added for CLR log transform for mixMC
circosPlot variable name fixed for mixDIABLO, new argument size.variables
cimDiablo and circosPlot match name to legend color for mixDIABLO, new arguments transpose, row.names and col.names

Date: 03/08/2016

Call to plsda.vip() from the RVAideMemoire package
Speed up computations for PCA with logratio transformation
perf / tune for sPLS-DA with log ratio transformation (that will improve the performance of the model)
network function for block.spls models (still in development!)