With the advent of high-throughput sequencing technologies, multivariate dimension reduction methods propose powerful statistical analyses to obtain a first understanding of large and complex data sets. They provide insightful visualisations, are efficient on large data sets and make little assumptions about the distribution of the data. In addition, they are highly flexible as unsupervised (exploratory) or supervised (classification) analyses can be performed. The latest innovative developments in this exciting and fast-moving area of research include and integration of different types of data sets and variable selection. This hands-on course will introduce key concepts in multivariate dimension reduction, starting first with Principal Component Analysis, then with innovative approaches for statistical integration of multiple data sets with a particular focus on variable selection. Nineteen methods are currently available in the mixOmics package, amongst which thirteen are developed by the mixOmics team. Each methodology introduced in the workshop will be illustrated on real biological studies directly available from the package.
Organized by: Melbourne Integrative Genomics, University of Melbourne
Fees for 3 days are AUD450 for RHD students, AUD600 for UoM or affiliates based on the Parkville campus, AUD900 for external non-profit organisations and AUD1200 for industry / government. The Computational Biology Research Initiative (University of Melbourne) proudly sponsors registration bursaries (50% of the registration costs) to 5 RHD students enrolled at UoM and 5 ECR (<= 3 years post PhD, full time equivalent) UoM or affiliate based on the Parkville campus, indicate your eligilibity at the survey link below.
Registrations fees include coffee breaks, lunch and one ice-breaker dinner (Monday 23 July evening), lecture notes and electronic material (slides, R code, data).
Location: Room 101 Alan Gilbert Building, University of Melbourne
Registration A link for registration has been sent to all selected participants. Priority was given to postgraduate students and early career researchers, with a maximum of 30 participants.
Contacts mixomics[ at] math.univ-toulouse.fr (for pre-requisites)
Prerequisite and requirements We require from the trainees a good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) to fully benefit from the workshop. Participants are requested to bring their own laptop, having installed the software RStudio http://www.rstudio.com/and the R package mixOmics (instructions will be provided prior to the training).
Day 1 & 2: methods and hands-on. The following broad topics will be covered.
A. Key methodologies in mixOmics and their variants:
- Exploration of one data set and how to estimate missing values
- Identification of biomarkers to discriminate different treatment groups
- Integration of two data sets and identification of biomarkers
- Repeated measurements design
- Integration of more than two data sets to identify multi omics signatures
B. Review on the graphical outputs implemented in mixOmics
- Sample plot representation
- Variable plot representation for data integration
- Other useful graphical outputs
C. Case studies and applications
Five case studies will be analysed using the methods presented above, with a focus on transcriptomics, proteomics and 16S metagenomics data sets.
Day 3: bring your own data. Participants will be given the opportunity to analyse their own data under the guidance and the advice of the instructors. Participants can also work in a team. Some data sets will also be provided for those unable to bring their own data.
The following statistical concepts will be introduced: covariance and correlation, multiple linear regression, classification and prediction, cross-validation, selection of diagnostic or prognostic markers, l1 and l2 penalties in a regression framework. Each methodology will be illustrated on a case study (theory and application will alternate). Note that mixOmics is not limited to biological data only and can be applied to other type of data where integration is required.
Target group The course is intended for data analysts in the fields of bioinformatics, computational biology and applied statistics with some statistical knowledge and a good working knowledge in R. It will be particularly useful to those interested in:
- Exploring large data sets.
- Selecting features with methods implementing LASSO-based penalisations.
- Using graphical techniques to better visualise data.
- Understanding and/or applying multivariate projection methodologies to large data sets.
Anticipated learning outcomes After completion of this workshop, participants will be able to
- Understand fundamental principles of multivariate projection-based dimension reduction technique.
- Perform statistical integration and feature selection using recently developed multivariate methodologies.
- Apply those methods to high throughput biological studies, including their own studies.