News – Page 4 – mixOmics

April 12-13 2018, Sydney, AUS

[The workshop is restricted to WestMead staff only]

The objective of this workshop is to introduce the fundamental concepts of multivariate dimension reduction methodologies. Those methods are particularly useful for data exploration and integration of large data sets, and especially in the context of systems biology, or in research areas where statistical data integration is required. Each methodology that will be presented during the course will be applied on biological “omics” studies including transcriptomics, metabolomics and proteomics and microbiome data sets using the R package mixOmics (https://mixomics.org/).

Instructor: Dr Kim-Anh Lê Cao

Tutor: Ms Eva Yiwen Wang (Melbourne Integrative Genomics, University of Melbourne)

Organized and sponsored by Westmead Hub Bioinformatics (Dr Erdalh Teber)

Dates 12 April 2-5pm and 13 April 9-12pm

Practical information The workshop is free of charge for all participants. Priority will be given to students and early career researcher. The workshop includes tuition, course material. The workshop excludes tea/coffee and lunch during the breaks

Location Seminar rooms 1/2 – Children’s Medical Research Institute, 214 Hawkesbury Rd, Westmead NSW 2145

Contact mixomics[ at] math.univ-toulouse.fr

Prerequisite and requirements We require from the trainees a good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) to fully benefit from the workshop*. Participants are requested to bring their own laptop, having installed the software RStudio http://www.rstudio.com/and the R package mixOmics (instructions will be provided 2 weeks prior to the training).

*A few online resources we highly recommend to refresh / get an introduction to R:

O’Reilly Code School TryR – this is a truly fantastic online interactive introduction to learning basic skills in R. Warning: the tutorial has a persistent pirate metaphor.
twotorials – 2 minute videos that teach you how to do simple tasks in R. “got two minutes? Learn some statistical programming in R. Its easy, free, and FUN!”
Data camp free Introduction to R to master the basics in R. ‘With the knowledge gained in this course, you will be ready to undertake your first very own data analysis’

Outline

Half day 1: multivariate analysis of one dataset

Exploration of one data set with Principal Component Analysis and visualisations

Identification of biomarkers to discriminate different treatment groups with PLS-Discriminant Analysis

Half day 2: integrative analysis of multiple datasets

Data integration with multivariate projection-based methods and identification of multi-omics signatures
Graphical visualisations for data integration analyses

July 23-25 2018, Melbourne AUS (beginner)

This was our first workshop on the Melbournian grounds. 34 participants joined the workshop, including 10 ECR and PhD students who received CBRI funding.

Some feedback from the workshop from our participants
What did you like about that workshop?
‘It was a really good balance between the statistical background and hands-on application of mixOmics software. Kim-Anh and Sebastien were both fantastic instructors and introduced challenging concepts in a very clear way. Slides, notes and R scripts will be a great resource.‘
‘Very systematic and concise delivery on these methods. Cases are also very good. Extremely helpful for me handling my 16s data.’
‘It was really well taught and both instructors were excellent teachers – I felt like I could keep up even though at some points it was really difficult. The R code was great and will be really helpful for working with my own data. I also liked to opportunity to have a whole half day working on our own data‘

Kim-Anh explaining multivariate challenges for microbiome data

A debrief session each morning with Sebastien running the show.

With the advent of high-throughput sequencing technologies, multivariate dimension reduction methods propose powerful statistical analyses to obtain a first understanding of large and complex data sets. They provide insightful visualisations, are efficient on large data sets and make little assumptions about the distribution of the data. In addition, they are highly flexible as unsupervised (exploratory) or supervised (classification) analyses can be performed. The latest innovative developments in this exciting and fast-moving area of research include and integration of different types of data sets and variable selection. This hands-on course will introduce key concepts in multivariate dimension reduction, starting first with Principal Component Analysis, then with innovative approaches for statistical integration of multiple data sets with a particular focus on variable selection. Nineteen methods are currently available in the mixOmics package, amongst which thirteen are developed by the mixOmics team. Each methodology introduced in the workshop will be illustrated on real biological studies directly available from the package.

Instructor: Dr Kim-Anh Lê Cao and Dr Sébastien Déjean

Organized by: Melbourne Integrative Genomics, University of Melbourne

Fees for 3 days are AUD450 for RHD students, AUD600 for UoM or affiliates based on the Parkville campus, AUD900 for external non-profit organisations and AUD1200 for industry / government. The Computational Biology Research Initiative (University of Melbourne) proudly sponsors registration bursaries (50% of the registration costs) to 5 RHD students enrolled at UoM and 5 ECR (<= 3 years post PhD, full time equivalent) UoM or affiliate based on the Parkville campus, indicate your eligilibity at the survey link below.

Registrations fees include coffee breaks, lunch and one ice-breaker dinner (Monday 23 July evening), lecture notes and electronic material (slides, R code, data).

Location: Room 101 Alan Gilbert Building, University of Melbourne

Registration A link for registration has been sent to all selected participants. Priority was given to postgraduate students and early career researchers, with a maximum of 30 participants.

Contacts mixomics[ at] math.univ-toulouse.fr (for pre-requisites)

Prerequisite and requirements We require from the trainees a good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) to fully benefit from the workshop. Participants are requested to bring their own laptop, having installed the software RStudio http://www.rstudio.com/and the R package mixOmics (instructions will be provided prior to the training).

Outline

Day 1 & 2: methods and hands-on. The following broad topics will be covered.

A. Key methodologies in mixOmics and their variants:

Exploration of one data set and how to estimate missing values
Identification of biomarkers to discriminate different treatment groups
Integration of two data sets and identification of biomarkers
Repeated measurements design
Integration of more than two data sets to identify multi omics signatures

B. Review on the graphical outputs implemented in mixOmics

Sample plot representation
Variable plot representation for data integration
Other useful graphical outputs

C. Case studies and applications

Five case studies will be analysed using the methods presented above, with a focus on transcriptomics, proteomics and 16S metagenomics data sets.

Day 3: bring your own data. Participants will be given the opportunity to analyse their own data under the guidance and the advice of the instructors. Participants can also work in a team. Some data sets will also be provided for those unable to bring their own data.

The following statistical concepts will be introduced: covariance and correlation, multiple linear regression, classification and prediction, cross-validation, selection of diagnostic or prognostic markers, l₁ and l₂ penalties in a regression framework. Each methodology will be illustrated on a case study (theory and application will alternate). Note that mixOmics is not limited to biological data only and can be applied to other type of data where integration is required.

Target group The course is intended for data analysts in the fields of bioinformatics, computational biology and applied statistics with some statistical knowledge and a good working knowledge in R. It will be particularly useful to those interested in:

Exploring large data sets.
Selecting features with methods implementing LASSO-based penalisations.
Using graphical techniques to better visualise data.
Understanding and/or applying multivariate projection methodologies to large data sets.

Anticipated learning outcomes After completion of this workshop, participants will be able to

Understand fundamental principles of multivariate projection-based dimension reduction technique.
Perform statistical integration and feature selection using recently developed multivariate methodologies.
Apply those methods to high throughput biological studies, including their own studies.

June 7-8 June 2018, Saclay, FR (advanced)

The objective of this advanced workshop is to introduce the fundamental concepts of multivariate dimension reduction methods for the integration of high-throughput biological data sets. The aim of this workshop is to introduce our latest mixOmics integrative frameworks and in particular N-integration with DIABLO where several ‘omics data sets measured on the same biological samples or specimens but using different types of technological platforms (this excludes SNP and categorical data). The aim is to identify a correlated multi-‘omics molecular signature explaining a phenotype of interest. The workshop will also introduce another type of integration for cross-platform comparison and the combination of independent studies: P-integration with MINT considers independent data sets measured on the same P variables (e.g. genes) but in different studies, and generated from different labs. The aim is to identify a robust molecular signature across those independent studies (note: mostly focused on gene expression data).

Some feedback from the workshop from our participants
What did you like about that workshop? The combination of lectures and hands on data analysis. The material was presented in a digestible manner for a variety of researchers in different fields; The balance between practice and theory. The fact that even ongoing developments are on the program; It had a good pace and it was deep enough in the methods. Right to the point; It was great. I like that it’s only two days and that it’s not too basic. Also great to have time to test our data or some example datasets at our pace; Keep up the good work :)

Studious! ‘[I liked] the way the theory and the hands on where combined, blocks of two hours were a good measure’

Instructor: Dr Kim-Anh Lê Cao

Tutor: Dr Olivier Chapleur

Organized and sponsored by: Professeurs Invites program Université d’Evry and Institute for Plant Science Saclay (IPS2).

Dates 7-8 June, 2 days, 9am-5pm

Practical information Registration fees are 200 EUR for postgraduate students, 300EUR for academics and 600 EUR for participants from the private sector. The workshop includes tuition, course material, morning and afternoon coffee breaks and lunch.

Location: Institut de Sciences des Plantes – Paris-Saclay (IPS2), Gif-sur-Yvette (Parisian region), salle rouge.

Registration EOI is now closed. You will be contacted to register to the workshop. Priority will be given to postgraduate students and early career researchers, with a maximum of 30 participants.

Accommodation options In a true French fashion, bear in mind that the 7-8th June have been declared as striking days (not from the mixOmics team, I must reassure you!), therefore public transport might be severely affected. Best is too book a hotel nearby
1 – a few min walk to the IPS2 campus where the workshop will take place: Campanile Paris Sud – Saclay (preferred options given the circumstances)
2 – between 30 – 40 min RER train + walk:
Séjours & Affaires Atlantis – MASSY
Aparthotel Adagio access Paris Massy Gare TGV
Residhome Appart Hotel Paris-Massy
Mercure Paris Massy Gare TGV

Contacts mixomics[ at] math.univ-toulouse.fr (for pre-requisites)

Prerequisite and requirements This is a semi-advanced workshop. We require from the trainees a very good working knowledge in R programming (e.g. R is used on a weekly basis to perform data mining and statistical data analyses) as well as some experience in using basic mixOmics methods (PCA and PLS-DA with parameter tuning along with interpretation of mixOmics graphical outputs) to be able to benefit from the workshop. Participants are requested to bring their own laptop, having installed the software RStudio http://www.rstudio.com/and the R package mixOmics (instructions will be provided prior to the training).

Outline

Day 1 (9am – 5pm).

sPLS-DA refresher, including microbiome data analysis
Some time for data analysis

Lunch
DIABLO
Some time for the analysis of your own data
Ice breaker dinner (to your own cost, we will advise of the venue, near the workshop)

Day 2 (9am – 5pm).

Case study highlight on DIABLO (Gregory)
MINT to integrate independent studies/ protocols
Case studies highlights on MINT (Olivier: 16S data, Kim-Anh: single cell data)

Lunch
Longitudinal / time course omics study: updates and where we are going next
Case study highlight on time course omics data integration with sPLS, block.spls (Kim-Anh: metagenomics study, see slide deck)
Some time for data analysis, debrief and departure.

A quick video introduction for mixOmics, vote for us!

Dear mixOmics friends, users, and adventurers,

We are reaching out to you to get your unbiased vote ;) for the Bioinformatics PeerPrize III where we promote our latest publication in PLoS Computational Biology as a software article.

For those not familiar with the package, the little 3min video will give you a brief introduction to the topics of

`omics data integration in systems biology
multivariate dimension reduction techniques
mixOmics: what is it?
our main integrative methods DIABLO and MINT

This prize is a great opportunity for us to disseminate the toolkit. As you know, software development and obtaining resources to do so is not a piece of cake, but we managed, along the years. In 2017 the package was downloaded 29,000 times and is still going strong! thanks to your support and your invaluable feedback.

Vote for us if you like our entry! Votes closeon Feb 19. Thank you!

https://bioinformatics-peer-prize-iii.thinkable.org
(it will require the entry of your organisation and a ref of a paper where you were co-author on. They take this seriously!)

More news about what is coming up in 2018 for mixOmics very soon. We wish you many successful mixOmics analyses to you all for 2018!

6.3.1 on CRAN: bug fixes and latest news

We pushed 6.3.1 following a major bug in 6.3.0 when dealing with missing values (especially with DIABLO). Another bug related to the one-sided t-test in the tune functions. All good now. Nipals is also faster to run.

A big thank to the users who give us feedback via our bitbucket issue list, this is very useful to us to continue improving the package.

The 3 workshops we ran in October and November 2017 were a success. The first Advanced workshop resulted in many stimulating discussions that will help the development team to move forward. The two beginner workshops were also a lot of fun. We are particularly pleased to see how the small mixOmics community is growing!

Our paper has finally been published in Plos Computational Biology as a software article. The main methods are described in the poster below. We are now working on the long awaited DIABLO manuscript so that it leaves bioaRxiv and has its life of its own!

In the next few months these are the changes we are planning ahead:

a conversion to bioconductor. Ain’t no fear, it should not affect the function calls. We think it is now the right time to reach the bioconductor community, but that implies a fair amount of implementation on our side. Consequently the methods development will slow down in the coming few months.
a mixOmics forum to encourage discussions around the 19 methods we have now currently available.

Summary of the mixOmics article in PLoS Comp Biol

mixOmics article is out!

Finally, after many years of hard work developing and implementing the methods, we summarised them into a nice software paper in PLoS Computational Biology, primarily focusing on the supervised analyses.

Note that DIABLO is still not published yet (we are working on it!) but a preprint is available on bioRxiv. For more questions on this framework contact us!

Version 6.3.0 and workshop

A new CRAN version is now available. We have considerably improved the computational time for the tune and perf functions! (see example below). We also fixed some reproducibility issues when using parallel computing with a set seed.

The update of the package will require new dependencies: ‘matrixStats’, ‘rARPACK’, ‘gridExtra’

There are still some spots left for the beginner mixOmics workshop in Toulouse, 9-10 Nov. Details here.

Enhancements:
————-
– huge gain in computation time for the tune functions tune.splsda and tune.block.splsda. The larger the data, the bigger the gain. Requires new dependencies: ‘matrixStats’, ‘rARPACK’, ‘gridExtra’
– a plot for an object `tune.block.splsda’
– tune.multilevel function was deprecated a while ago and now removed.

Bug fixes:
———-
– fixed reproducibility problem when using parallel coding in tune.block.splsda (via the `cpus’ argument)
– network: correlation with missing values fixed, label names fixed
– fixed perf for block.splsda objects with prediction distances
– some NA issues reported in 6.2.0 fixed (hopefully)

The gain in computational time is reported below for our different supervised frameworks. It all depends on your operating system, but generally, the user time = execution of the code, the system time = system processes (e.g opening and closing files), and the elapsed time is the difference in times since we started the stopwatch.

Two postdoc positions available, University of Melbourne, Australia

The Lê Cao lab is opening two research fellow positions based at the University of Melbourne, Australia.

Research Fellow in Computational Genomics and Statistics

Position number: 0042986. Level B University of Melbourne.
Three years fixed term.
Applications open until 14th November Apply here.

The School for Mathematics and Statistics, and its partner the Centre for Systems Genomics (CSG), are seeking an enthusiastic research fellow to work on our pioneering projects in statistical integration of large biological data sets, and their implementation in the mixOmics multivariate R toolkit.

The Research Fellow will be responsible for leading cutting-edge statistical developments to address some of the data analysis challenges arising from the latest advances in high-throughput sequencing technologies, including the analysis of microbiome data (amplicon, shotgun sequencing and longitudinal experiments), genetic or single cell sequencing data. The successful applicant will thrive in a unique multi-disciplinary environment amongst statisticians, bioinformaticians and biologists in this initiative, with an opportunity to contribute to teaching in the classroom (within the incumbent’s areas of expertise) and for hands-on multiple day workshops.

Research Fellow in Computational Biology or Statistics

Position number: 0043228.
Level A or B, subject to qualifications and experience, University of Melbourne.
Two years fixed term.
Applications have now closed

The Centre for Stem Cell Systems is seeking a skilled research fellow to work on our exciting large-scale data integration projects conducted at the Centre. As one of its flagship programs, the Centre has reviewed, collated and curated hundreds of datasets from various stem cells sources, to investigate cell growth, differentiation capacity and associated donor properties. This is the largest international collection of curated stem cells data, which are available through our repository www.stemformatics.org.

The Research Fellow in Computational Biology and Statistics will be responsible for contributing to novel and innovative statistical developments to integrate difference sources of biological data available on matched biological samples (transcripts, miRNA, proteomics, metabolites, etc) to identify molecular signatures, as well as further refine or characterise subtypes of stem cell, in particular human mesenchymal stromal cells.

6.2.0, 2 postdoc positions and workshops

Dear mixOmics users,

Our new update 6.2.0 is now available on CRAN as part of our new version of our manuscript.

manuscript & package update:

The mixOmics manuscript introducing the supervised and integrative frameworks (PLS-DA, DIABLO block.plsda and MINT) has be updated, along with all the R / Sweave case studies, manuscript and codes are available at this link. The case studies are also published on our website (sPLSDA:SRBCT, Case study: TCGA and Case study: MINT).

The manuscript describes in more details the difference prediction distances (see also the supplemental material) and the interpretation of the AUROC for our supervised methods.

The constraint argument was removed from all our methods, due to a risk of overfitting.

New features:

– The constraint argument (version 6.1.0 – 6.1.3) was removed in the functions perf and tune for all supervised objects because of a risk of overfitting

Enhancements:

– AUROC aded for MINT objects mint.plsda and mint.splsda where the study name needs to be specified, e.g. auroc( .., roc.study = “study4”). See ?auroc

– choice.ncomp output added on all perf and tune functions for all supervised methods.

– mat.c output for pls and plsda objects (matrix of coefficients from the regression of X / residual matrices X on the X-variates).

Bug fixes (thank you to the users who notified us on bitbucket):

– fixed bug when using predict, perf or tune with the error msg: ‘Error in predict.spls(spls.res, X.test[, nzv]) : ‘newdata’ must include all the variables of ‘object$X”

Workshops:

We advertised two workshops at this link. The advanced workshop 23-24 Oct 2017 is fully subscribed. This is our first MAW (mixOmics advanced workshop), but there will be more planned in 2018. We still have a few spots left for the classic workshop on the 9-10 Nov 2017 in Toulouse, contact us for more information (priority will be given to students and early career researchers).

Two senior postdoc positions (2 year and 3 year) still open!

The Australian mixOmics team now based at the University of Melbourne is recruiting two senior postdocs in the fields of computational biology or statistics, 1 full time 2-year position to work with the Stemformatics team on exciting omics integrating problems (‘omics and single cell omics) to improve stem cell classification, and 1 full time 3-year position for innovative multivariate methods developments for ‘omics time course, microbiome and P-integration. Contact us for more information.

Website update:

With the invaluable help from the bioinformatics masters students Danielle Davenport and Zoe Welham we are currently revamping the website to ensure all codes are running correctly. Thank you for those who sent us some feedback!

Nov 22-24 2017, Toulouse, FR

[Update: 5 spots left, contact us] ]Following last year’s success of our COST workshop, the second edition will be run by Dr Sébastien Déjean and his crew in Toulouse. The event is organised by the local committee at UGSF (Drs Estelle Goulas, Anne-Sophie Blervacq, Anne Creach, Brigitte Huss and Prof Simon Hawkins)

Dates: 12-14 September (3 days)

Venue: Toulouse, France, TBA

Fees: 300 EUR (academics) and 600 (private) that include tuition, course material, coffee breaks, lunches and one dinner in town. Bursary for 12 PhD students and early career researchers are funded by COST ACTION FA1306, apply!

Application: see details here.

Send your CV to: Estelle.goulas [at] univ-lille1.fr and mention whether you are applying for a travel bursary.

Deadline for application: 15 October 2017

More details: at this link.