July 23-25 2018, Melbourne AUS (beginner)

This was our first workshop on the Melbournian grounds. 34 participants joined the workshop, including 10 ECR and PhD students who received CBRI funding.

Some feedback from the workshop from our participants
What did you like about that workshop?
‘It was a really good balance between the statistical background and hands-on application of mixOmics software. Kim-Anh and Sebastien were both fantastic instructors and introduced challenging concepts in a very clear way. Slides, notes and R scripts will be a great resource.‘
‘Very systematic and concise delivery on these methods. Cases are also very good. Extremely helpful for me handling my 16s data.’
‘It was really well taught and both instructors were excellent teachers – I felt like I could keep up even though at some points it was really difficult. The R code was great and will be really helpful for working with my own data. I also liked to opportunity to have a whole half day working on our own data‘

Kim-Anh explaining multivariate challenges for microbiome data

A debrief session each morning with Sebastien running the show.

With the advent of high-throughput sequencing technologies, multivariate dimension reduction methods propose powerful statistical analyses to obtain a first understanding of large and complex data sets. They provide insightful visualisations, are efficient on large data sets and make little assumptions about the distribution of the data. In addition, they are highly flexible as unsupervised (exploratory) or supervised (classification) analyses can be performed. The latest innovative developments in this exciting and fast-moving area of research include and integration of different types of data sets and variable selection. This hands-on course will introduce key concepts in multivariate dimension reduction, starting first with Principal Component Analysis, then with innovative approaches for statistical integration of multiple data sets with a particular focus on variable selection. Nineteen methods are currently available in the mixOmics package, amongst which thirteen are developed by the mixOmics team. Each methodology introduced in the workshop will be illustrated on real biological studies directly available from the package.

Instructor: Dr Kim-Anh Lê Cao and Dr Sébastien Déjean

Organized by: Melbourne Integrative Genomics, University of Melbourne

Fees for 3 days are AUD450 for RHD students, AUD600 for UoM or affiliates based on the Parkville campus, AUD900 for external non-profit organisations and AUD1200 for industry / government. The Computational Biology Research Initiative (University of Melbourne) proudly sponsors registration bursaries (50% of the registration costs) to 5 RHD students enrolled at UoM and 5 ECR (<= 3 years post PhD, full time equivalent) UoM or affiliate based on the Parkville campus, indicate your eligilibity at the survey link below.

Registrations fees include coffee breaks, lunch and one ice-breaker dinner (Monday 23 July evening), lecture notes and electronic material (slides, R code, data).

Location: Room 101 Alan Gilbert Building, University of Melbourne

Registration A link for registration has been sent to all selected participants. Priority was given to postgraduate students and early career researchers, with a maximum of 30 participants.

Contacts mixomics[ at] math.univ-toulouse.fr (for pre-requisites)

Prerequisite and requirements We require from the trainees a good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) to fully benefit from the workshop. Participants are requested to bring their own laptop, having installed the software RStudio http://www.rstudio.com/and the R package mixOmics (instructions will be provided prior to the training).

Outline

Day 1 & 2: methods and hands-on. The following broad topics will be covered.

A. Key methodologies in mixOmics and their variants:

Exploration of one data set and how to estimate missing values
Identification of biomarkers to discriminate different treatment groups
Integration of two data sets and identification of biomarkers
Repeated measurements design
Integration of more than two data sets to identify multi omics signatures

B. Review on the graphical outputs implemented in mixOmics

Sample plot representation
Variable plot representation for data integration
Other useful graphical outputs

C. Case studies and applications

Five case studies will be analysed using the methods presented above, with a focus on transcriptomics, proteomics and 16S metagenomics data sets.

Day 3: bring your own data. Participants will be given the opportunity to analyse their own data under the guidance and the advice of the instructors. Participants can also work in a team. Some data sets will also be provided for those unable to bring their own data.

The following statistical concepts will be introduced: covariance and correlation, multiple linear regression, classification and prediction, cross-validation, selection of diagnostic or prognostic markers, l₁ and l₂ penalties in a regression framework. Each methodology will be illustrated on a case study (theory and application will alternate). Note that mixOmics is not limited to biological data only and can be applied to other type of data where integration is required.

Target group The course is intended for data analysts in the fields of bioinformatics, computational biology and applied statistics with some statistical knowledge and a good working knowledge in R. It will be particularly useful to those interested in:

Exploring large data sets.
Selecting features with methods implementing LASSO-based penalisations.
Using graphical techniques to better visualise data.
Understanding and/or applying multivariate projection methodologies to large data sets.

Anticipated learning outcomes After completion of this workshop, participants will be able to

Understand fundamental principles of multivariate projection-based dimension reduction technique.
Perform statistical integration and feature selection using recently developed multivariate methodologies.
Apply those methods to high throughput biological studies, including their own studies.