This workshop will only be run for a specific group of participants. Other online courses will be announced soon!
We will ask you to fill the internal survey so that we can tweak the course accordingly. Do not forget to lock in the dates already in your calendar!
Context. Advances in high-throughput technologies have transformed the way we examine molecular information. However, analytical tool development is critically trailing behind data generation, which hinders the analysis, understanding or integration of omics data. Data integration adopts a holistic, data-driven and hypothesis-free approach. This new approach is necessary to understand the role of biological systems and posit new hypotheses.
This online workshop will introduce concepts of multivariate dimension methods developed in mixOmics for statistical analysis. Our methods make no distributional assumptions, are highly flexible for unsupervised (exploratory), supervised (classification) and integration analyses. Various analytical frameworks will be presented ranging from data exploration, selection of markers, integration with other omics datasets and introduction to time-course analysis. There will be an opportunity also to talk about the analysis of microbiome data and time-course data.
Each methodology will be illustrated on real biological studies during a short hands-on session in R. You can also bring your own data to analyse your data on the spot using the R scripts that we will provide. The workshop will cover general omics data integration concepts with appropriate case studies.
Instructors: A/Profs Sébastien Déjean (University of Toulouse, sessions 1-3) and Kim-Anh Lê Cao (University of Melbourne, sessions 4-5)
Material includes lecture notes, slides, R code, and data.
Bring your own data. Participants will be given the opportunity to analyse their own data using the R codes provided. We will give specific instructions on how to process and format the data. Participants can also work in a team. Some data sets will also be provided for those unable to bring their own data.
Dates for the five sessions (approx 2h per session):
- Sept 21st, 23th, 28th 9-11am EST / 9-11pm Singapore (same day for all)
- Sept 30th, Oct 5th 6-8pm EST / 6-8am Singapore (+1 day)
Contact: mixomics[ at] math.univ-toulouse.fr (for pre-requisite or content)
Prerequisite and requirements. We require from the trainees a good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) to fully benefit from the workshop. Participants are requested to bring their own laptop, having installed the software RStudio http://www.rstudio.com/and the R package mixOmics (instructions will be provided prior to the training).
Each session is 2h length, roughly divided into 50min presentation, 10 min break and 1h hands-on with recap at the end of the session.
Session 1: PCA and sparse PCA 101
- We will start with the basics that are necessary to understand the more complicated concepts!
Session 2: PLS-Discriminant Analysis
- We will move on to discriminant analysis, to separate sample groups and identify molecular signatures. The hands-on session can include your own data. (*PLS = Projection to Latent Structures / Partial Least Squares)
Session 3: integration of two data sets with PLS and CCA
- This session will also introduce useful graphics to visualise the results of those methods. BYO data welcome. (CCA = Canonical Correlation Analysis)
Session 4: multi-omics data integration with block PLS (DIABLO)
- Building up on the previous sessions, we will cover multiblock PLS-DA with additional numerical and graphical outputs. You will anlyse BYO data (if you have already analysed your data with the previous methods) or data provided in the package.
Session 5: various methodological extensions
- This more theoretical session will cover recent methodological developments ‘around’ (but not necessarily ‘in’) mixOmics, from compositional data analysis (for microbiome studies), batch effect management to P-integration and time-course omics data exploration (topics chosen according to your needs). This session will not include hands-on on session but relevant R code / vignettes will be hand out.
The following statistical concepts will be introduced: covariance and correlation, multiple linear regression, classification and prediction, cross-validation, selection of markers, penalised regressions. Each methodology will be illustrated on a case study (theory and application will alternate).
Target group The course is intended for microbiologists working in the fields of bioinformatics, computational biology and applied statistics with some statistical knowledge and a good working knowledge in R. It will be particularly useful to those interested in:
- Exploring data sets.
- Selecting molecular / microbial features with methods implementing LASSO-based penalisations.
- Using graphical techniques to better visualise data.
- Understanding and/or applying multivariate projection methodologies to large data sets.
Anticipated learning outcomes After completion of this workshop, participants will be able to
- Understand fundamental principles of multivariate projection-based dimension reduction technique.
- Perform statistical integration and feature selection using recently developed multivariate methodologies.
- Apply those methods to high throughput microbiome studies, including their own studies.