Forum – Page 5 – mixOmics

mixOmics article is out!

Finally, after many years of hard work developing and implementing the methods, we summarised them into a nice software paper in PLoS Computational Biology, primarily focusing on the supervised analyses.

Note that DIABLO is still not published yet (we are working on it!) but a preprint is available on bioRxiv. For more questions on this framework contact us!

Version 6.3.0 and workshop

A new CRAN version is now available. We have considerably improved the computational time for the tune and perf functions! (see example below). We also fixed some reproducibility issues when using parallel computing with a set seed.

The update of the package will require new dependencies: ‘matrixStats’, ‘rARPACK’, ‘gridExtra’

There are still some spots left for the beginner mixOmics workshop in Toulouse, 9-10 Nov. Details here.

Enhancements:
————-
– huge gain in computation time for the tune functions tune.splsda and tune.block.splsda. The larger the data, the bigger the gain. Requires new dependencies: ‘matrixStats’, ‘rARPACK’, ‘gridExtra’
– a plot for an object `tune.block.splsda’
– tune.multilevel function was deprecated a while ago and now removed.

Bug fixes:
———-
– fixed reproducibility problem when using parallel coding in tune.block.splsda (via the `cpus’ argument)
– network: correlation with missing values fixed, label names fixed
– fixed perf for block.splsda objects with prediction distances
– some NA issues reported in 6.2.0 fixed (hopefully)

The gain in computational time is reported below for our different supervised frameworks. It all depends on your operating system, but generally, the user time = execution of the code, the system time = system processes (e.g opening and closing files), and the elapsed time is the difference in times since we started the stopwatch.

Two postdoc positions available, University of Melbourne, Australia

The Lê Cao lab is opening two research fellow positions based at the University of Melbourne, Australia.

Research Fellow in Computational Genomics and Statistics

Position number: 0042986. Level B University of Melbourne.
Three years fixed term.
Applications open until 14th November Apply here.

The School for Mathematics and Statistics, and its partner the Centre for Systems Genomics (CSG), are seeking an enthusiastic research fellow to work on our pioneering projects in statistical integration of large biological data sets, and their implementation in the mixOmics multivariate R toolkit.

The Research Fellow will be responsible for leading cutting-edge statistical developments to address some of the data analysis challenges arising from the latest advances in high-throughput sequencing technologies, including the analysis of microbiome data (amplicon, shotgun sequencing and longitudinal experiments), genetic or single cell sequencing data. The successful applicant will thrive in a unique multi-disciplinary environment amongst statisticians, bioinformaticians and biologists in this initiative, with an opportunity to contribute to teaching in the classroom (within the incumbent’s areas of expertise) and for hands-on multiple day workshops.

Research Fellow in Computational Biology or Statistics

Position number: 0043228.
Level A or B, subject to qualifications and experience, University of Melbourne.
Two years fixed term.
Applications have now closed

The Centre for Stem Cell Systems is seeking a skilled research fellow to work on our exciting large-scale data integration projects conducted at the Centre. As one of its flagship programs, the Centre has reviewed, collated and curated hundreds of datasets from various stem cells sources, to investigate cell growth, differentiation capacity and associated donor properties. This is the largest international collection of curated stem cells data, which are available through our repository www.stemformatics.org.

The Research Fellow in Computational Biology and Statistics will be responsible for contributing to novel and innovative statistical developments to integrate difference sources of biological data available on matched biological samples (transcripts, miRNA, proteomics, metabolites, etc) to identify molecular signatures, as well as further refine or characterise subtypes of stem cell, in particular human mesenchymal stromal cells.

6.2.0, 2 postdoc positions and workshops

Dear mixOmics users,

Our new update 6.2.0 is now available on CRAN as part of our new version of our manuscript.

manuscript & package update:

The mixOmics manuscript introducing the supervised and integrative frameworks (PLS-DA, DIABLO block.plsda and MINT) has be updated, along with all the R / Sweave case studies, manuscript and codes are available at this link. The case studies are also published on our website (sPLSDA:SRBCT, Case study: TCGA and Case study: MINT).

The manuscript describes in more details the difference prediction distances (see also the supplemental material) and the interpretation of the AUROC for our supervised methods.

The constraint argument was removed from all our methods, due to a risk of overfitting.

New features:

– The constraint argument (version 6.1.0 – 6.1.3) was removed in the functions perf and tune for all supervised objects because of a risk of overfitting

Enhancements:

– AUROC aded for MINT objects mint.plsda and mint.splsda where the study name needs to be specified, e.g. auroc( .., roc.study = “study4”). See ?auroc

– choice.ncomp output added on all perf and tune functions for all supervised methods.

– mat.c output for pls and plsda objects (matrix of coefficients from the regression of X / residual matrices X on the X-variates).

Bug fixes (thank you to the users who notified us on bitbucket):

– fixed bug when using predict, perf or tune with the error msg: ‘Error in predict.spls(spls.res, X.test[, nzv]) : ‘newdata’ must include all the variables of ‘object$X”

Workshops:

We advertised two workshops at this link. The advanced workshop 23-24 Oct 2017 is fully subscribed. This is our first MAW (mixOmics advanced workshop), but there will be more planned in 2018. We still have a few spots left for the classic workshop on the 9-10 Nov 2017 in Toulouse, contact us for more information (priority will be given to students and early career researchers).

Two senior postdoc positions (2 year and 3 year) still open!

The Australian mixOmics team now based at the University of Melbourne is recruiting two senior postdocs in the fields of computational biology or statistics, 1 full time 2-year position to work with the Stemformatics team on exciting omics integrating problems (‘omics and single cell omics) to improve stem cell classification, and 1 full time 3-year position for innovative multivariate methods developments for ‘omics time course, microbiome and P-integration. Contact us for more information.

Website update:

With the invaluable help from the bioinformatics masters students Danielle Davenport and Zoe Welham we are currently revamping the website to ensure all codes are running correctly. Thank you for those who sent us some feedback!

Nov 22-24 2017, Toulouse, FR

[Update: 5 spots left, contact us] ]Following last year’s success of our COST workshop, the second edition will be run by Dr Sébastien Déjean and his crew in Toulouse. The event is organised by the local committee at UGSF (Drs Estelle Goulas, Anne-Sophie Blervacq, Anne Creach, Brigitte Huss and Prof Simon Hawkins)

Dates: 12-14 September (3 days)

Venue: Toulouse, France, TBA

Fees: 300 EUR (academics) and 600 (private) that include tuition, course material, coffee breaks, lunches and one dinner in town. Bursary for 12 PhD students and early career researchers are funded by COST ACTION FA1306, apply!

Application: see details here.

Send your CV to: Estelle.goulas [at] univ-lille1.fr and mention whether you are applying for a travel bursary.

Deadline for application: 15 October 2017

More details: at this link.

Nov 9-10 2017, Toulouse, FR

Some feedback from our participants to the question: ‘What did you like most about that workshop?’ (Survey Monkey results)

Theoretical + practical courses, course materials are really great
Regular oral review of the take-home messages
The slides and Kim-Anh presentations: very pedagogical
The workshop provides the exact combination of theory and practical exercises I liked to have. The examples with R scripts are so organized that you can understand the thinking process behind the analysis.
Open atmosphere and good pace, with enough of theory to understand the core principles
Didactic speakers, not much mathematics and formulas, alternance of theory and practice, well prepared R scripts and documents
The use of these tools is straightforward
Both the lecturer created a very nice exchange with the group, making everyone comfortable in making questions and express doubts.
Clear- Concise- Adaptable- Very complete R scripts and pdf documents

——–

We will be running a classic 2-day mixOmics workshop in November, taught by Dr Kim-Anh Lê Cao and Sébastien Déjean and other mixOmics team contributors. The event and Dr Kim-Anh Lê Cao’s visit is sponsored by the visiting scientist program INP Toulouse.

The objective of the workshop is to introduce the fundamental concepts of multivariate dimension reduction methodologies. Those methods are particularly useful for data exploration and integration of large data sets, and especially in the context of systems biology, or in research areas where statistical data integration is required. Each methodology (one ‘omics, 2 and multiple ‘omics integration) that will be presented during the course will be applied on biological “omics” studies including transcriptomics, metabolomic, proteomics and microbiome data sets using the R package mixOmics

Date: 9-10 November 2017

Venue: salle E111, Batiment E, UMR-GenPhySE, INRA Toulouse Castanet Tolosan (map access coming soon)

Prerequisites: We expect the participants to a good working knowledge in R (e.g. handling data frames and perform basic calculations). Participants are requested to bring their own laptops, having installed the software RStudio and the R package mixOmics (instructions provided prior to the training).

Practical information: The workshop is free of charge for all participants as it is fully sponsored by INP. Priority will be given to INP students, external postgraduate students and early career researcher. The workshop includes tuition, course material. The workshop excludes tea/coffee and lunch during the breaks.

More details: see this flyer.

Register: registrations have now closed.

Want to know more? contact us at mixomics [at] math.univ-toulouse.fr

Oct 23-24 2017, Toulouse, FR, Advanced Workshop

[Update: the workshop is full subscribed and registrations have closed!] This is the first edition of our advanced workshop, run by Dr Kim-Anh Lê Cao and Sébastien Déjean. The event and Dr Kim-Anh Lê Cao’s visit is sponsored by the visiting scientist program INP Toulouse and by the company Methodomics.

The mixOmics package has undergone substantial improvements and methodological developments in the last 18 months to address the strong demand from the computational and biological community to integrate multiple (>2) `omics data sets, including microbiome, genotype and longitudinal data. The aim of this advanced workshop is to introduce our new frameworks and encourage discussions, collaborations and suggested improvements on the themes including:

N-integration with DIABLO
P-integration with MINT
Longitudinal `omics analysis with timeOmics (not yet in mixOmics!)
Exploratory multivariate analysis with SNPOmics (not yet in mixOmics!)
mixMC: mixOmics for Microbial communities, with N-integration extensions

Date: 23-24 October 2017

Venue: Ground level, building ‘SDAR’, INRA Toulouse, Castanet Tolosan (map access coming soon)

Prerequisites: Since this is an advanced course, we expect the participants to be expert in R programming language and familiar with multivariate projection based methods and mixOmics.

More details: at this link

Interested? contact us at mixomics [at] math.univ-toulouse.fr

Update 6.1.3 on CRAN, postdoc position, manuscript and upcoming workshops

Dear mixOmics users,

We have been quiet for a while, but we have some good news! A CRAN update, a manuscript in bioRxiv, a 3-year postdoc position open to be part of the mixOmics core team, and three workshops planned for the French autumn!

The 6.1.3 update is now on the CRAN, we fixed a few bugs (see list below), and we also have a new plotIndiv argument ‘background‘ to visualise the prediction area for a PLS-DA and sPLS-DA model (max 2 components). This is a powerful plot to visualise the effect of the different prediction methods. Why does a prediction method matters for the performance of the discriminant analysis models? See elements of information below.

Example of prediction area plot for the SRBCT data with a PLS-DA model, see ?srbct

All you need is the background.predict function, and overlay the results with plotIndiv. For example:

data(liver.toxicity)
X = liver.toxicity$gene
Y = as.factor(liver.toxicity$treatment[, 4])
plsda.liver = plsda(X, Y, ncomp = 2)

# calculating background for the two first components, and the mahalanobis distance
background = background.predict(plsda.liver, comp.predicted = 2, dist = "mahalanobis.dist")

plotIndiv(plsda.liver, background = background, legend = TRUE)

We also added the new functions get.confusion_matrix and get.BER to calculate a confusion matrix based on class prediction of test samples and their real class, and calculate their Balanced Error Rate, see ?get.BER. Example of outputs (for a DIABLO analysis on the breast cancer TCGA multi omics study):

Example from our DIABLO pipeline available at https://mixomics.org/wp-content/uploads/2012/03/mixOmicsRscripts.zip

We have submitted a new version of our mixOmics manuscript to bioRxiv! The manuscript is available at this link and has been a top tweeted story in #bioinformatics. The manuscript mostly summarises the latest mixOmics frameworks for Discriminant Analysis (sPLS-DA, DIABLO and MINT) with extensive R and Sweave codes here, give it a go! The supplemental thoroughly details these methods. It almost sounds like an end of a first mixOmics era as Florian, our very talented and dedicated core developer, debugger and developer of MINT has moved on for another postdoctoral position at the University of Queensland, and Kim-Anh is starting her new group as a Senior Lecturer position at the University of Melbourne (UoM), at the Centre for Systems Genomics. Do not fear, this means there will be a new round of developments, notably in the microbiome and metagenomics field, as we are opening a new 3-year senior postdoctoral position in Computational Biostatistics at UoM (with opportunity to teach at the School of Mathematics and Statistics). More details at this link.

Seventeen multivariate methods currently implemented in mixOmics! Can you recognise your favourite?

Three workshops are coming up, between Sept – Nov 2017 in France. The first edition of MAW’17 is the advanced mixOmics workshop to introduce our new frameworks (published and in development: DIABLO, MINT, SNPOmics, timeOmics, mixMC and extension of integration) to our advanced users. The workshop is free, but you will need to cover your own travel and accommodation costs. Toulouse, 23-24 Oct 2017. Send us an email and we can send you the details. The two other workshops will be our normal beginner mixOmics workshops, in September (Lille) and in early November (Toulouse). More details on our website soon.

Other enhancements and bug fixes:

Enhancements:
————-
1 – perf.sgccda (for DIABLO) now implements a constraint model (see details in ?perf)
2 – legend = TRUE option in circosPlot and plotDiablo
Bug fixes:
———-
– tune.splsda had a bug when assessing the ‘choice.ncomp’ based on ones-sided t-test of the error rate when the error rate was constant.
– sparse PCA deflation algorithm fixed
– added add mixOmics:: for pls functions to avoid clash with other packages

Why does a prediction distance matter? (full story in our manuscript)

The supervised multivariate methods in mixOmics can be applied on an external test set to predict the outcome of new samples with the predict function (predict), or to assess the performance of the statistical model (perf). The predict function calculates prediction scores for each new sample, or predicted coordinates, which are equivalent to the latent component scores in the training set.

Prediction distances. Our supervised models work with dummy indicator matrices Y to indicate the class membership of each sample, and result in a prediction score for each outcome category k, k = 1, . . . , K. Therefore, the scores across all classes K need to be combined to obtain the final prediction of a given test sample using a prediction distance. We propose distances such as ‘maximum distance’, ‘Mahalanobis distance’ and ‘Centroids distance’, as detailed our supplemental information and in ?predict. Those distance can give different predictions, which will be assessed in the performance of the model.

Patch 6.1.2 and some updates

R CRAN update

The new patch version of mixOmics is on CRAN. It includes a few bug fixes raised by our users (thank you!) and a few improvements. Florian Rohart has been fiddling really hard with ggplot2 to make a new plotIndiv version that can beautifully handle two legends!

plotIndiv example with two legends in 6.1.2

# indicate the group, treatment and pch for each sample
my.group
 [1] "group 1" "group 1" "group 2" "group 2" "group 3" "group 3" "group 4" "group 4" "group 1" "group 1" "group 2" "group 2" "group 3" "group 3" "group 4" "group 4"
[17] "group 1" "group 1" "group 2" "group 3" "group 3" "group 4" ....
my.treatment
 [1] "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" "trt 2" ....
my.pch.trt
 [1] 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 16 16 ....

plotIndiv(pca.res, ind.names = F, title = 'PCA', legend = TRUE, 
 # legend 1 colors setting:
 group = my.group, col.per.group = color.per.group, legend.title = 'Groups', 
 # pch setting:
 legend.title.pch = 'Treatment', pch = my.pch.trt, pch.levels = my.treatment)

Here is a list of the major bug fixes and improvements for 6.1.2:

New features:
————-
1 – tune.splsda now returns a ‘choice.ncomp’ which indicates the number of components to choose (only if nrepeat > 2, criterion based on t-tests)
2 – plotIndiv now enables two legends based on color, as well as pch, when pch is a factor different from what is indicated in group (use arguments pch and pch.levels, see ?plotIndiv)

Enhancements:
————-
1 – argument ‘cutoff’ now replaces ‘threshold’ in network for consistency with plotVar and circosPlot
2 – new argument ‘sd’ in plot.perf for block.splsda method
3 – new arguments “color.Y” and “color.blocks” in cimDiablo
4 – new argument ‘xlim’ in plotLoadings

Bug fixes:
———-
– directionality is now enforced in AUROC (results lower than 0.5 can be obtained, which would indicate a very poor model performance)

Manuscripts:

The MINT paper is out:

Rohart F., Matigian N., Eslami A., Bougeard S and Lê Cao, K. A.MINT: A multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. Now available on bioRxiv! in press in BMC Bioinformatics 18:128.

The mixOmics manuscript (first draft) is on bioRxiv, with sweave codes:

Rohart F., Gautier, B, Singh, A and Lê Cao, K. A. mixOmics: an R package for ‘omics feature selection and multiple data integration. On bioRxiv. Sweave and R scripts available here.

Patch 6.1.1

Dear mixOmics users,

We have a new patch version 6.1.1 available from the CRAN to fix a few bugs by our team or mixOmics users (thank you!) and few enhancements and updates to follow ggplot2 updates.

For those using DIABLO, please note points 8 & 9 as we changed the default parameters for a scheme = ‘horst’ instead of ‘centroid’ and init = ‘svd.single’ instead of ‘svd’ in the methods, as we feel it was more appropriate. That may change your results compared to last version and you may want to use the old parameters instead.

New features:
1 – mint.pca function to perform unsupervised integration of independent data sets
2 – new weighted prediction for block approaches for both unsupervised and supervised analyses, see ?predict.spls and ?predict.splsda.
3 – ‘cpus’ parameter for sPLS-DA perf/tune and block.splsda perf/tune added to run the code in parallel

Enhancements:
4 – ‘constraint’ parameter for sPLS-DA perf and tune functions added.
5 – plotLoading for PCA object
6 – color argument in plot.tune and plot.perf added

Bug fixes:
7- predict with logratio (the logratio transform is now performed inside the predict function)
8- in block methods, scheme = ‘horst’ set by default instead of centroid
9- in block methods, initialisation set to svd.single by default

Thank you again for using mixOmics.