fbpx

Webinar: Time-course multi-omics integration

I presented this talk for a group of statisticians at the Australian National University in Canberra. The abstract is below.

Topics covered: linear mixed model splines, multi-omics integration (PLS multiblock), correlation circle plot interpretation, timeOmics.

Longitudinal experiments are becoming increasingly popular in omics studies to monitor molecular changes following treatment or during disease progression. Integrating these data sets can give us some mechanistic insights into the different types of omics layers.

However, longitudinal omics data present numerous challenges including a small number of time points that may be unevenly spaced and unmatched between different data types, a small number of individuals, and a high individual variability. While current approaches have focused on differential expression across time or time profile clustering, the modelling of omics time profiles in a multivariate manner is critically lacking to understand longitudinal biological interactions.

I will present a statistical framework, timeOmics, to identify correlated profiles over time and between omics (transcriptomics, metabolomics, microbiome) to give insights into the molecular dynamics of biological systems and discuss future avenues of research in this expanding area.

Some key references

The timeOmics package

timeOmics is currently not directly available from the mixOmics package, instead it is a separate R package hosted on Bioconductor. See the Bioconductor page for installation instructions.

[open] Self-paced online course Feb 24 – April 11, 2025

Single and multi-omics analysis and integration with mixOmics

This course is designed for:
  • Beginners looking for an introduction to mixOmics methods for single- and multi-omics analyses.
  • Current mixOmics users who want to deepen their understanding of the mixOmics methods.
  • Users who would like more guidance on analyzing their own data (we also provide exemplar datasets).

The workshop is self-paced and spans across 7 weeks. There are 4 Q&A live sessions, and many opportunities to interact with the cohort and your instructor Prof Kim-Anh Lê Cao via Slack. BYO data is encouraged: we provide advice so that you can analyse your own data with mixOmics tools as part of your learning process.  A good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) is essential to fully benefit from the course*. 

According to our past participants, a time commitment of 5-8h/week was sufficient to feel that they were progressing. Here is some feedback from a previous course.

We provide a certificate of attendance or completion.

Register here, places are limited!

Fees

Research Higher Degree students enrolled at a University: $495 AUD (incl. GST) [discount code: MIXO_RHD]

Staff and members from Universities & Not-for-profit organisations: $825 (incl. GST) [discount code: MIXO_NFP_STAFF]

Other industries: $1320 AUD (incl. GST)

discounts of 5% for a group of 3-9 learners and 10% for 10+ learners, however, this will require a single invoice per group.

These funds go towards the support of a software developer to maintain the package. If you need an invoice, contact Student Support at continuing-education[at]unimelb.edu.au

Teaching Period Dates

Teaching commences: Monday, 24 Feb 2025, 9:00 am AEST

Teaching concludes: Sunday, 23 March 2025, 11:59 pm AEST (after 4 weeks)

(non marked) Assessment due: Friday 4 April 2025 (2 weeks prep)

Peer-review of assessment due: Friday 11 April 2025 (1 week prep)

The course is divided into theory (50%) and hands-on practice, with the opportunity to analyse your own data. The exercises and assignments are in R. Participants are encouraged to use RStudio and Rmarkdown (template and R code provided).

*Need an R refresher?

Learners who are not proficient in R do not get the full benenefit of the course (based on their own, honest, feedback!) For those looking for an R refresher well ahead of the course:

Webinar: Φ-Space for continuous phenotyping of single-cell multi-omics data

We have developed a new PLS method for cell type continuous annotation of single cells, now in preprint!

  • Φ-Space addresses numerous challenges faced by state-of-the-art automated annotation methods:
    • to identify continuous and out-of-reference cell states,
    • to deal with batch effects in reference,
    • to utilise bulk references and multi-omic references.
  • Φ-Space uses soft classification to phenotype cells on a continuum. The continuous annotation, or phenotype space embedding is then used to reduce the dimensionality of the data for various downstream analyses.

Φ-Space: Continuous phenotyping of single-cell multi-omics data. Jiadong Mao, Yidi Deng, Kim-Anh Lê Cao. bioRxiv 2024. 

View this 52min video of Kim-Anh Lê Cao presenting Φ-Space at the WEHI Bioinformatics seminar:

Abstract

Single-cell multi-omics technologies have empowered increasingly refined characterisa- tion of the heterogeneity of cell populations. Automated cell type annotation methods have been developed to transfer cell type labels from well-annotated reference datasets to emerging query datasets. However, these methods suffer from some common caveats, including the failure to characterise transitional and novel cell states, sensitivity to batch effects and under-utilisation of phenotypic information other than cell types (e.g. sample source and disease conditions).

We developed Φ-Space, a computational framework for the continuous phenotyping of single-cell multi-omics data. In Φ-Space we adopt a highly versatile modelling strategy to continuously characterise query cell identity in a low-dimensional phenotype space, defined by reference phenotypes. The phenotype space embedding enables various downstream analyses, including insightful visualisations, clustering and cell type labelling.

We demonstrate through three case studies that Φ-Space (i) characterises develop- ing and out-of-reference cell states; (ii) is robust against batch effects in both reference and query; (iii) adapts to annotation tasks involving multiple omics types; (iv) over- comes technical differences between reference and query.

The Φ-Space package

Φ-Space is currently not directly available from the mixOmics package, instead it is a separate R package that can be installed from Github.

Webinar: PCA and PLS-DA

These two recordings were part of a presentation to WEHI for their postgraduate lecture series for a diverse audience.

In the PCA presentation (18 min), we explain the concept of linear combination of variables (components) and useful graphical outputs such as correlation circle plots and biplots.

In the PLS-DA presentation (7 min), we talk about the concept of multivariate signature.

If you want to know more about the actual algorithm under the hood, you can watch this webinar on PLS.

[closed] Self-paced online course Oct 21 – Dec 6 2024

Unfortunately we had to cancel the workshop as we did not receive a sufficient number of participants to justify running the workshop at this time. These workshops involve peer review and a cohort feel to provide the best experience to our learners.

Register your EOI here and we will let you know when the registration page is up. Our next intake is scheduled for February 2025.

Feedback from a previous iteration can be found here.

Key summary

  • The new course is open and will run for 7 weeks. This course is online, but at your own pace, meaning that you need to dedicate enough time (5-8h per week) to fully benefit from the program.
  • There are 4 weeks of asynchronous learning (you work at our own pace to cover the material each week).
  • There are 4 live webinars organised on the first 4 Thursdays at 5pm AEST (convert your time here)  to summarise some key concepts and ask your questions (the webinars will be recorded, as there are daylight savings during this period).
  • You will have the opportunity to chat on Slack and ask your questions during the whole course.
  • You can analyse your own data for the assessment (due in week 6) or use the data provided. You will reinforce your learning by marking the assignments of 2-3 other learners.
  • Teaching Period Dates, asynchronised:
    • Teaching commences: Monday, 21 Oct 2024, 9:00 am AEST
    • Teaching concludes: Sunday, 17 Oct 2024, 11:59 pm AEST (after 4 weeks)
    • (non marked) Assessment due: Friday 29 Nov 2024 (2 weeks prep)
    • Peer-review of assessment due: Friday 6 Dec 2024 (1 week prep)
  • Fees vary for
    • Research Higher Degree students enrolled at a University: $495 AUD (incl. GST) [discount code: MIXO_RHD]
    • Staff and members from Universities & Not-for-profit organisations: $825 (incl. GST) [discount code: MIXO_NFP_STAFF]
    • Other industries: $1320 AUD (incl. GST)
    • discounts of 5% for a group of 3-9 learners and 10% for 10+ learners, however, this will require a single invoice per group.

(these funds go towards the support of a software developer to maintain the package)

Information about the course and registration: https://study.unimelb.edu.au/find/short-courses/mixomics-r-essentials-for-biological-data-integration/

The number of places is limited, so first come first serve (this course runs once or twice a year)

What if I need an invoice? Contact Student Support at continuing-education[at]unimelb.edu.au

Prerequisites. A good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) is essential to fully benefit from the course*. The course is divided into theory (50%) and hands-on practice, with the opportunity to analyse your own data. The exercises and assignments are in R. Participants are encouraged to use RStudio and Rmarkdown (template and R code provided).

*Learners who are not proficient in R do not get the full benenefit of the course (based on their own, honest, feedback!)

Webinar: Microbial network inference for longitudinal microbiome studies with LUPINE

Our latest method based on PLS to infer microbial networks across time is now in preprint!

  • LUPINE is a PLS-based method that combines dimension reduction, and partial correlations to infer associations between taxa.
  • LUPINE takes into account information across time points
  • LUPINE has been designed for relatively small sample sizes and small number of time points

Microbial network inference for longitudinal microbiome studies with LUPINE. Saritha Kodikara, Kim-Anh Lê Cao. bioRxiv 2024.05.08.593086; 

View this 50min video of Dr Saritha Kodikara presenting her method LUPINE:

We also have a second video presented by Prof Kim-Anh Lê Cao who sets LUPINE in the context of microbiome longitudinal data analysis, elaborating more on the types of analytical objects covered in Kodikara et al. (2022) Statistical challenges in longitudinal microbiome data analysisBriefings in Bioinformatics.

Below you will also find the most common questions related to LUPINE.

FAQ:

Q: Do you build up the network from the covariance matrix or from the inverse covariance matrix? And what are you doing linear regression on?

A: The network is built on the partial correlation so it would be similar to the inverse covariance matrix. But instead of estimating the inverse covariance matrix, we calculate partial correlations through linear regression. To estimate the partial correlation between taxa a and taxa b, we regress their counts on the low dimensional representation of other taxa (excluding taxa a and b). This is then repeated for all pairs (we have an efficient way to do this computationally).

Q: You reduce the dimension of the data into one dimension. How much variance can be explained by the 1st component in your computation?

A: It depends on the data, but in the data we analysed, and if consider the single time point scheme only with PCA, the first component explained about 25% of the total variance. We could add more components into the regression but that may overfit the regression model. This is why we only select the first component, which explains much of the variance (for PCA, single time point) or covariance (for PLS, multiple time points).

Q: Do you think that this approach would work on single cell data trying to look at gene co expression in sort of longitudinal data in across time points?

A: It will not work with the present single cell technologies, because in LUPINE we need the same individuals/samples/cells across time to infer the association.

Q: When you do the linear regression, do you regress directly on the counts with all the zeros and the sparsity that you mentioned?

A: Yes, the method was originally developed for count data. We regress on the count data, but we also include library size as an offset to account for different library sizes. The method also works with center log ratio values, which I used to analyse the third case study.

Q: Do you apply your method for the two groups combined or separately?

A: I model each group separately as we assume that each group has a unique network.

Q: You’re building the networks building based on the partial correlations. What about the actual network for representation, do you actually binarize it?

A: Yes, I binarize the network based on a correlation test.

The LUPINE package

LUPINE is currently not directly available from the mixOmics package, instead it is a separate R package that can be installed from Github.

[closed] Self-paced online course Feb 5 – March 22 2024

This workshop is now closed. Fill in this short survey to register your interest. A new iteration of the course might be run between Sept – Nov if there is sufficient interest!

Key summary

  • The new course is open and will run for 7 weeks. This course is online, but at your own pace, meaning that you need to dedicate enough time (5-8h per week) to fully benefit from the program.
  • There are 4 weeks of asynchronous learning (you work at our own pace to cover the material).
  • There are 4 live webinars organised on the first 4 Thursdays at 5pm AEST (convert your time here)  to summarise some key concepts and ask your questions (the webinars will be recorded).
  • You will have the opportunity to chat on Slack and ask your questions during the whole course.
  • You can analyse your own data for the assessment (due in week 6) or use the data provided. You will reinforce your learning by marking the assignments of 2-3 other learners.

Feedback from the 2022 iteration can be found here.

  • Teaching Period Dates, asynchronised:
    • Teaching commences: Monday, 5 Feb 2024, 9:00 am AEST
    • Teaching concludes: Sunday, 29 Feb 2024, 11:59 pm AEST (4 weeks)
    • (non marked) Assessment due: Friday 15 March 2024 (2 weeks)
    • Peer-review of assessment due: Friday 22 March 2024 (1 week)
  • Fees vary for
    • Research Higher Degree students enrolled at a University: $495 AUD (incl. GST) [discount code: MIXO_RHD]
    • Staff and members from Universities & Not-for-profit organisations: $825 (incl. GST) [discount code: MIXO_NFP_STAFF]
    • Other industries: $1320 AUD (incl. GST)
    • discounts of 5% for a group of 3-9 learners and 10% for 10+ learners, however, this will require a single invoice per group.

(these funds go towards the support of a software developer to maintain the package)

The number of places is limited, so first come first serve (this course runs once or twice a year)

What if I need an invoice? Contact Student Support at continuing-education[at]unimelb.edu.au

Prerequisites. A good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) is essential to fully benefit from the course*. The course is divided into theory (50%) and hands-on practice, with the opportunity to analyse your own data. The exercises and assignments are in R. Participants are encouraged to use RStudio and Rmarkdown (template and R code provided).

*Learners who are not proficient in R do not get the full benefit of the course (based on their own, honest, feedback!)

[closed] Self-paced online course May 22 – July 7 2023

If you’ve missed out, our next iteration will run from 19th Feb – 5th April 2024. You can fill up this short survey to be notified when we open our next course.

Summary

  • The new course is open and will run for 7 weeks. This course is online, but at your own pace, meaning that you need to dedicate enough time (5-8h per week) to fully benefit from the program.
  • There are 4 weeks of asynchronous learning (you work at our own pace to cover the material).
  • There are 4 live webinars organised on Thursdays at 5pm AEST (convert your time here) in the first 4 weeks to summarise some key concepts and ask your questions (the webinars will be recorded).
  • You will have the opportunity to chat on Slack and ask your questions during the whole course.
  • You can analyse your own data for the assessment (due in week 6) or use the data provided. You will reinforce your learning by marking the assignments of 2-3 other learners.

Feedback from the 2022 iteration can be found here.

  • Teaching Period Dates, asynchronised:
    • Learning Start: Monday, 22 May 2023, 9:00 am AEST
    • Learning Ends: Sunday, 18 June 2023, 11:59 pm AEST (4 weeks)
    • (non marked) Assessment due: Friday 30th June 2023 (2 weeks)
    • Peer-review of assessment due: Friday 7th July 2023 (1 week)
  • Fees vary for
    • Research Higher Degree students enrolled at a University: $495 AUD (incl. GST)
    • Staff and members from Universities & Not-for-profit organisations: $825 (incl. GST)
    • Other industries: $1320 AUD (incl. GST)
    • discounts of 5% for a group of 3-9 learners and 10% for 10+ learners, however, this will require a single invoice per group.

(these funds go towards the support of a software developer to maintain the package)

The number of places is limited, so first come first serve (we aim to run this course twice a year).

What if I need an invoice? Contact Student Support at continuing-education[at]unimelb.edu.au

Prerequisites. A good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) is essential to fully benefit from the course*. The course is divided into theory (50%) and hands-on practice, with the opportunity to analyse your own data. The exercises and assignments are in R. Participants are encouraged to use RStudio and Rmarkdown (template and R code provided).

*Learners who are not proficient in R do not get the full benefit of the course (based on their own, honest, feedback!)

[closed] 13-14 March 2023, Brisbane, Aus

We will be running a 2-day workshop at Frazer Institute, University of Queendland. The workshop will cover 1.5 days of lectures and hands-on, and an additional 0.5 day for discussions and opportunities to analyse your own data (assuming the data are already processed and normalised).

Fill the survey so that you can register your interest and needs for this workshop. We can only allow a limited number of participants, so lock in those dates in your calendar before we confirm your participation! Priority will be given to postgraduate students and early career researchers. Results will be announced to the participants with details for registration on 17th February

Context. Advances in high-throughput technologies have transformed the way we examine molecular information, including microbial communities. However, analytical tool development is critically trailing behind data generation, which hinders the analysis, understanding or integration of omics data. Data integration adopt a holistic, data-driven and hypothesis-free approach. This new approach is necessary to understand the role of biological systems and posit new hypotheses.

The workshop will introduce concepts of multivariate dimension methods developed in mixOmics for statistical analysis. Our methods make no distributional assumptions, are highly flexible for unsupervised (exploratory), supervised (classification) and integration analyses. Various analytical frameworks will be presented ranging from data exploration, selection of markers, integration with other omics datasets and introduction to time-course analysis. There will be an opportunity also to analyse your own data.

Each method will be illustrated on real biological studies. The last afternoon is ‘BYO data’ where you can reinforce your learnings on your own study! 

Instructor: A/Prof Kim-Anh Lê Cao;Tutor: Nick Matigian (QCIF)

Organized and hosted by: Frazer institute, University of Queensland

There are no registration fees for this workshop. We do expect your attendance as the number of places is limited. The workshop is fully catered. Slides, R code and data will be provided.

Registration Fill the survey and lock the dates in your calendar! As we have a limited number of participants (30), priority will be given to postgraduate students and early career researchers. Results will be announced to the participants with details for registration after the survey’s deadline. Online attendance is also available for a limited number of participants (but with reduced opportunities for interactions).

Location: TBA, Translational Research Institute

Contact: kimanh.lecao[ at] unimelb.edu.au (for pre-requisite or content)

Prerequisite and requirements. We require from the trainees a good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) to fully benefit from the workshop. Participants are requested to bring their own laptop, having installed the software RStudio http://www.rstudio.com/and the R package mixOmics (instructions will be provided prior to the training).

Outline

The following broad topics will be covered during these two days:

A. Key methodologies in mixOmics and their variants:

  • Exploration of one data set with Principal Component Analysis (the basics!)
  • Identification of a molecular signature to discriminate different treatment groups with PLS-Discriminant Analysis
  • Integration of two data sets and identification of markers with PLS
  • Integration of more than two data sets to identify multi omics signatures (if sufficient interest) with PLS-DIABLO

B. Graphical outputs implemented in mixOmics

  • Sample plot representation
  • Variable plot representation for data integration
  • Other useful graphical outputs

C. Case studies and applications

Several omics studies (and microbiome if there is some interest) will be analysed using the methods presented above.

Day 2: bring your own data. Participants will be given the opportunity to analyse their own data under the guidance and the advice of the instructors. Participants can also work in a team. Your data need to be processed and normalised beforehand.

The following statistical concepts will be introduced: covariance and correlation, multiple linear regression, classification and prediction, cross-validation, selection of markers, penalised regressions. Each methodology will be illustrated on a case study (theory and application will alternate).

Target group The course is intended for molecular biologists working in the fields of bioinformatics, computational biology and applied statistics with some statistical knowledge and a good working knowledge in R. It will be particularly useful to those interested in:

  1. Exploring data sets.
  2. Selecting molecular signatures with methods implementing LASSO-based penalisations.
  3. Using graphical techniques to better visualise data.
  4. Understanding and/or applying multivariate projection methodologies to large data sets.

Anticipated learning outcomes After completion of this workshop, participants will be able to

  1. Understand fundamental principles of multivariate projection-based dimension reduction technique.
  2. Perform statistical integration and feature selection using recently developed multivariate methodologies.
  3. Apply those methods to high throughput microbiome studies, including their own studies.

[Closed] Self-paced online course Oct 31st – Nov 27 2022

The next iteration of the course will be in September 2023 for a likely duration of 6-8 weeks (it will be advertised 3 months before opening the course). This course is online, but at your own pace, meaning that you need to dedicate enough time (5-8h per week) to fully benefit from the program.

Feedback from the 2022 iteration:

  • You can do it at your own time since the resources provided (Webinars and reading material) are very helpful. Due to working hours I had to watch/read on demand (at my own time)
  • Kim-Anh has done a very good job in the webinars and was generally approachable and helpful. Thank you! The online course material was very good and explained the basics of the program quite well. The integration with the mixOmics online material and sample cases is very helpful.
  • It had the option to attend live webinars (two offered times) or watch recordings. – The possibility to ask questions was available for both live webinars and stack. – The assignments are designed to enhance further learning allowing to use of either own data or provided data at different challenge skills.
  • Course organisers were very responsive to our questions in Slack. Modules flowed nicely and were well organised. Webinars were useful.

This is our second round of online course ‘mixOmics R Essentials for Biological Data Integration‘ that includes 4 weeks of asynchronous learning (with one live summary + Q&A per week), numerous chats on Slack and an additional 3 weeks to complete the assignment. Some feedback from our last round can be found here. Our last survey seem to suggest most learners spent between 5-8h per week on the program.

  • Teaching Period Dates, asynchronised:
    • Start – Monday, 31st October 2022
    • End – Sunday, 27th November 2022
    • (non marked) Assessment due Sunday, 9th December 2022
    • Peer-review of assessment due Sunday, 16th December 2022
  • Fees vary for
    • Research Higher Degree students enrolled at a University: $495 AUD (incl. GST)
    • Staff and members from Universities & Not-for-profit organisations: $825 (incl. GST)
    • Other industries: $1320 AUD (incl. GST)
    • discounts of 5% for a group of 3-9 learners and 10% for 10+ learners, however, this will require a single invoice per group.

(these funds go towards the support of a software developer to maintain the package)

Information about the course and registration: https://study.unimelb.edu.au/find/short-courses/mixomics-r-essentials-for-biological-data-integration/

The number of places is limited, so first come first serve (we aim to run this course twice a year).

What if I need an invoice? Contact Student Support at continuing-education[at]unimelb.edu.au

Prerequisites. A good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) is essential to fully benefit from the course*. The course is divided into theory (50%) and hands-on practice, with the opportunity to analyse your own data. The exercises and assignments are in R. Participants are encouraged to use RStudio and Rmarkdown (template and R code provided).

*Learners who are not proficient in R do not get the full benefit of the course (based on their own, honest, feedback!)