March 2018 – mixOmics

News 2018, workshops 2018 and DIABLO

Dear all,

The first few months of the year have been busy for us. Thanks to your support, we have been ranked second to the Bioinformatics Peer Prize (57 votes, so close after the winner with 59 votes!). Our entry is listed at this link if you would like to watch a basic introduction to the package.

For those who are new to mixOmics, I also cooked some prezi slides to introduce the broad context of where mixOmics sits, which was presented at the University of Melbourne ResBaz event in February.

We have now scheduled our 2018 workshops:

An advanced workshop focusing on omics data integration 7-8 June in the Parisian region. The registration will be in two stages: Expression of Interest due on April 29, followed by registration. The workshop will accommodate 30 participants. More details here.
A 3-day beginner workshop 23-25 July at the University of Melbourne. More details will be populated very soon.

We have pushed the second version of our DIABLO manuscript on bioRxiv. The codes are currently on gitHub but they will also be rendered on our website soon.

For some little news, you can also follow us on Twitter @mixOmics_team.

Kim-Anh for the mixOmics team

April 12-13 2018, Sydney, AUS

[The workshop is restricted to WestMead staff only]

The objective of this workshop is to introduce the fundamental concepts of multivariate dimension reduction methodologies. Those methods are particularly useful for data exploration and integration of large data sets, and especially in the context of systems biology, or in research areas where statistical data integration is required. Each methodology that will be presented during the course will be applied on biological “omics” studies including transcriptomics, metabolomics and proteomics and microbiome data sets using the R package mixOmics (https://mixomics.org/).

Instructor: Dr Kim-Anh Lê Cao

Tutor: Ms Eva Yiwen Wang (Melbourne Integrative Genomics, University of Melbourne)

Organized and sponsored by Westmead Hub Bioinformatics (Dr Erdalh Teber)

Dates 12 April 2-5pm and 13 April 9-12pm

Practical information The workshop is free of charge for all participants. Priority will be given to students and early career researcher. The workshop includes tuition, course material. The workshop excludes tea/coffee and lunch during the breaks

Location Seminar rooms 1/2 – Children’s Medical Research Institute, 214 Hawkesbury Rd, Westmead NSW 2145

Contact mixomics[ at] math.univ-toulouse.fr

Prerequisite and requirements We require from the trainees a good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) to fully benefit from the workshop*. Participants are requested to bring their own laptop, having installed the software RStudio http://www.rstudio.com/and the R package mixOmics (instructions will be provided 2 weeks prior to the training).

*A few online resources we highly recommend to refresh / get an introduction to R:

O’Reilly Code School TryR – this is a truly fantastic online interactive introduction to learning basic skills in R. Warning: the tutorial has a persistent pirate metaphor.
twotorials – 2 minute videos that teach you how to do simple tasks in R. “got two minutes? Learn some statistical programming in R. Its easy, free, and FUN!”
Data camp free Introduction to R to master the basics in R. ‘With the knowledge gained in this course, you will be ready to undertake your first very own data analysis’

Outline

Half day 1: multivariate analysis of one dataset

Exploration of one data set with Principal Component Analysis and visualisations

Identification of biomarkers to discriminate different treatment groups with PLS-Discriminant Analysis

Half day 2: integrative analysis of multiple datasets

Data integration with multivariate projection-based methods and identification of multi-omics signatures
Graphical visualisations for data integration analyses

July 23-25 2018, Melbourne AUS (beginner)

This was our first workshop on the Melbournian grounds. 34 participants joined the workshop, including 10 ECR and PhD students who received CBRI funding.

Some feedback from the workshop from our participants
What did you like about that workshop?
‘It was a really good balance between the statistical background and hands-on application of mixOmics software. Kim-Anh and Sebastien were both fantastic instructors and introduced challenging concepts in a very clear way. Slides, notes and R scripts will be a great resource.‘
‘Very systematic and concise delivery on these methods. Cases are also very good. Extremely helpful for me handling my 16s data.’
‘It was really well taught and both instructors were excellent teachers – I felt like I could keep up even though at some points it was really difficult. The R code was great and will be really helpful for working with my own data. I also liked to opportunity to have a whole half day working on our own data‘

Kim-Anh explaining multivariate challenges for microbiome data

A debrief session each morning with Sebastien running the show.

With the advent of high-throughput sequencing technologies, multivariate dimension reduction methods propose powerful statistical analyses to obtain a first understanding of large and complex data sets. They provide insightful visualisations, are efficient on large data sets and make little assumptions about the distribution of the data. In addition, they are highly flexible as unsupervised (exploratory) or supervised (classification) analyses can be performed. The latest innovative developments in this exciting and fast-moving area of research include and integration of different types of data sets and variable selection. This hands-on course will introduce key concepts in multivariate dimension reduction, starting first with Principal Component Analysis, then with innovative approaches for statistical integration of multiple data sets with a particular focus on variable selection. Nineteen methods are currently available in the mixOmics package, amongst which thirteen are developed by the mixOmics team. Each methodology introduced in the workshop will be illustrated on real biological studies directly available from the package.

Instructor: Dr Kim-Anh Lê Cao and Dr Sébastien Déjean

Organized by: Melbourne Integrative Genomics, University of Melbourne

Fees for 3 days are AUD450 for RHD students, AUD600 for UoM or affiliates based on the Parkville campus, AUD900 for external non-profit organisations and AUD1200 for industry / government. The Computational Biology Research Initiative (University of Melbourne) proudly sponsors registration bursaries (50% of the registration costs) to 5 RHD students enrolled at UoM and 5 ECR (<= 3 years post PhD, full time equivalent) UoM or affiliate based on the Parkville campus, indicate your eligilibity at the survey link below.

Registrations fees include coffee breaks, lunch and one ice-breaker dinner (Monday 23 July evening), lecture notes and electronic material (slides, R code, data).

Location: Room 101 Alan Gilbert Building, University of Melbourne

Registration A link for registration has been sent to all selected participants. Priority was given to postgraduate students and early career researchers, with a maximum of 30 participants.

Contacts mixomics[ at] math.univ-toulouse.fr (for pre-requisites)

Prerequisite and requirements We require from the trainees a good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) to fully benefit from the workshop. Participants are requested to bring their own laptop, having installed the software RStudio http://www.rstudio.com/and the R package mixOmics (instructions will be provided prior to the training).

Outline

Day 1 & 2: methods and hands-on. The following broad topics will be covered.

A. Key methodologies in mixOmics and their variants:

Exploration of one data set and how to estimate missing values
Identification of biomarkers to discriminate different treatment groups
Integration of two data sets and identification of biomarkers
Repeated measurements design
Integration of more than two data sets to identify multi omics signatures

B. Review on the graphical outputs implemented in mixOmics

Sample plot representation
Variable plot representation for data integration
Other useful graphical outputs

C. Case studies and applications

Five case studies will be analysed using the methods presented above, with a focus on transcriptomics, proteomics and 16S metagenomics data sets.

Day 3: bring your own data. Participants will be given the opportunity to analyse their own data under the guidance and the advice of the instructors. Participants can also work in a team. Some data sets will also be provided for those unable to bring their own data.

The following statistical concepts will be introduced: covariance and correlation, multiple linear regression, classification and prediction, cross-validation, selection of diagnostic or prognostic markers, l₁ and l₂ penalties in a regression framework. Each methodology will be illustrated on a case study (theory and application will alternate). Note that mixOmics is not limited to biological data only and can be applied to other type of data where integration is required.

Target group The course is intended for data analysts in the fields of bioinformatics, computational biology and applied statistics with some statistical knowledge and a good working knowledge in R. It will be particularly useful to those interested in:

Exploring large data sets.
Selecting features with methods implementing LASSO-based penalisations.
Using graphical techniques to better visualise data.
Understanding and/or applying multivariate projection methodologies to large data sets.

Anticipated learning outcomes After completion of this workshop, participants will be able to

Understand fundamental principles of multivariate projection-based dimension reduction technique.
Perform statistical integration and feature selection using recently developed multivariate methodologies.
Apply those methods to high throughput biological studies, including their own studies.

June 7-8 June 2018, Saclay, FR (advanced)

The objective of this advanced workshop is to introduce the fundamental concepts of multivariate dimension reduction methods for the integration of high-throughput biological data sets. The aim of this workshop is to introduce our latest mixOmics integrative frameworks and in particular N-integration with DIABLO where several ‘omics data sets measured on the same biological samples or specimens but using different types of technological platforms (this excludes SNP and categorical data). The aim is to identify a correlated multi-‘omics molecular signature explaining a phenotype of interest. The workshop will also introduce another type of integration for cross-platform comparison and the combination of independent studies: P-integration with MINT considers independent data sets measured on the same P variables (e.g. genes) but in different studies, and generated from different labs. The aim is to identify a robust molecular signature across those independent studies (note: mostly focused on gene expression data).

Some feedback from the workshop from our participants
What did you like about that workshop? The combination of lectures and hands on data analysis. The material was presented in a digestible manner for a variety of researchers in different fields; The balance between practice and theory. The fact that even ongoing developments are on the program; It had a good pace and it was deep enough in the methods. Right to the point; It was great. I like that it’s only two days and that it’s not too basic. Also great to have time to test our data or some example datasets at our pace; Keep up the good work :)

Studious! ‘[I liked] the way the theory and the hands on where combined, blocks of two hours were a good measure’

Instructor: Dr Kim-Anh Lê Cao

Tutor: Dr Olivier Chapleur

Organized and sponsored by: Professeurs Invites program Université d’Evry and Institute for Plant Science Saclay (IPS2).

Dates 7-8 June, 2 days, 9am-5pm

Practical information Registration fees are 200 EUR for postgraduate students, 300EUR for academics and 600 EUR for participants from the private sector. The workshop includes tuition, course material, morning and afternoon coffee breaks and lunch.

Location: Institut de Sciences des Plantes – Paris-Saclay (IPS2), Gif-sur-Yvette (Parisian region), salle rouge.

Registration EOI is now closed. You will be contacted to register to the workshop. Priority will be given to postgraduate students and early career researchers, with a maximum of 30 participants.

Accommodation options In a true French fashion, bear in mind that the 7-8th June have been declared as striking days (not from the mixOmics team, I must reassure you!), therefore public transport might be severely affected. Best is too book a hotel nearby
1 – a few min walk to the IPS2 campus where the workshop will take place: Campanile Paris Sud – Saclay (preferred options given the circumstances)
2 – between 30 – 40 min RER train + walk:
Séjours & Affaires Atlantis – MASSY
Aparthotel Adagio access Paris Massy Gare TGV
Residhome Appart Hotel Paris-Massy
Mercure Paris Massy Gare TGV

Contacts mixomics[ at] math.univ-toulouse.fr (for pre-requisites)

Prerequisite and requirements This is a semi-advanced workshop. We require from the trainees a very good working knowledge in R programming (e.g. R is used on a weekly basis to perform data mining and statistical data analyses) as well as some experience in using basic mixOmics methods (PCA and PLS-DA with parameter tuning along with interpretation of mixOmics graphical outputs) to be able to benefit from the workshop. Participants are requested to bring their own laptop, having installed the software RStudio http://www.rstudio.com/and the R package mixOmics (instructions will be provided prior to the training).

Outline

Day 1 (9am – 5pm).

sPLS-DA refresher, including microbiome data analysis
Some time for data analysis

Lunch
DIABLO
Some time for the analysis of your own data
Ice breaker dinner (to your own cost, we will advise of the venue, near the workshop)

Day 2 (9am – 5pm).

Case study highlight on DIABLO (Gregory)
MINT to integrate independent studies/ protocols
Case studies highlights on MINT (Olivier: 16S data, Kim-Anh: single cell data)

Lunch
Longitudinal / time course omics study: updates and where we are going next
Case study highlight on time course omics data integration with sPLS, block.spls (Kim-Anh: metagenomics study, see slide deck)
Some time for data analysis, debrief and departure.