Forum

Webinar: Φ-Space ST: a platform-agnostic method to identify cell states in spatial transcriptomics studies

We have a sequel to Φ-Space, Φ-Space ST developed by  Dr Jiadong Mao  for spatial transcriptomics studies! We are very excited about these new developments and the potential of Φ-Space for single cell annotation!

Φ-Space ST is:

  • A novel and fast approach for cell type composition analysis.
  • Platform-Agnostic and Scalable as it works across multiple spatial transcriptomics (ST) platforms, including CosMx, Visium, and Stereo-seq.
  • Accurate and integrative as it identifies cell states by leveraging multiple scRNA-seq references.
  • Segmentation-Free & Niche-Driven as it annotates cell states at subcellular resolution, uncovering niche-specific cell types and tumor-distinguishing patterns.

Φ-Space ST: a platform-agnostic method to identify cell states in spatial transcriptomics studies. Jiadong Mao, Jarny Choi, Kim-Anh Lê Cao. bioRxiv 2025.

Check Jiadong’s latest seminar he presented at Melbourne Integrative Genomics on Friday 14th February 2025:

Abstract

We introduce Φ-Space ST, a platform-agnostic method to identify continuous cell states in spatial transcriptomics (ST) data using multiple scRNA-seq references. For ST with supercellular resolution, Φ-Space ST achieves interpretable cell type deconvolution with significantly faster computation. For subcellular resolution, Φ-Space ST annotates cell states without cell segmentation, leading to highly insightful spatial niche identification. Φ-Space ST harmonises annotations derived from multiple scRNA-seq references, and provides interpretable characterisations of disease cell states by leveraging healthy references. We validate Φ-Space ST in three case studies involving CosMx, Visium and Stereo-seq platforms for various cancer tissues. Our method revealed niche-specific enriched cell types and distinct cell type co-presence patterns that distinguish tumour from non-tumour tissue regions. These findings highlight the potential of Φ-Space ST as a robust and scalable tool for ST data analysis for understanding complex tissues and pathologies.

mixOmics website update

We’re pleased to share that the mixOmics website has undergone a redesign to enhance your browsing experience and make it easier to access our resources.

What’s New?

  • Refreshed Design: A cleaner, more modern layout
  • 📚 Expanded Getting Started Pages: Helpful pages to help you get up and running with mixOmics
  • 🧭 Reorganized Navigation: A more intuitive menu to quickly find key resources
  • 🔗 Updated Social Links: Stay connected with the mixOmics community
  • 💬 Direct Links to the User Forum: If you haven’t already, join our mixOmics user forum to connect with over 500 other users and experts
  • 🧑‍💻 Updated About Pages: Learn more about the project and our team
  • 📅 Streamlined Workshops, Webinars, and News Sections: Easier access to events and updates
  • 🖥️ Embedded R Markdown Pages: Improved code presentation with syntax highlighting in our Methods, Plots, and Case Studies pages

We are continuing to make small improvements, so if you encounter any issues or have feedback, please feel free to contact us.

Thank you for your continued support of mixOmics.

The mixOmics Team

Page from R Markdown



Missing_Values.knit





All methodologies implemented in mixOmics can handle missing values.
In particular, (s)PLS, (s)PLS-DA,
(s)PCA utilise the NIPALS
(Non-linear Iterative
Partial Least
Squares) algorithm as part of their dimension reduction
procedures. This algorithm is built to handle NAs [1].

This is implemented through the nipals() function within
mixOmics. This function is called internally by the above methods but
can also be used manually, as can be seen below.

Usage in mixOmics

library(mixOmics)
data(liver.toxicity)
X <- liver.toxicity$gene[, 1:100] # a reduced size data set

## pretend there are 20 NA values in our data
na.row <- sample(1:nrow(X), 20, replace = TRUE)
na.col <- sample(1:ncol(X), 20, replace = TRUE)
X.na <- as.matrix(X)

## fill these NA values in X
X.na[cbind(na.row, na.col)] <- NA
sum(is.na(X.na)) # number of cells with NA
## [1] 20
# this might take some time depending on the size of the data set
nipals.tune = nipals(X.na, ncomp = 10)$eig
barplot(nipals.tune, xlab = 'Principal component', ylab = 'Explained variance')

FIGURE 1: Column graph of the explained variance of each Principal
Component.

If missing values need to be imputed, the package contains
impute.nipals() for this scenario. NIPALS
is used to decompose the dataset. The resulting components, singular
values and feature loadings can be used to reconstitute the original
dataset, now with estimated values where the missing values were
previously. To allow for the best estimation of missing values, there is
a large number of components being used (ncom = 10).

X.impute <- impute.nipals(X = X.na, ncomp = 10)
sum(is.na(X.impute)) # number of cells with NA
## [1] 0

The difference between the imputed and real values can be checked.
Here are the original values:

id.na = is.na(X.na) # determine position of NAs in dataframe

X[id.na] # show original values
##  [1]  0.09041 -0.04070  0.03497 -0.01712  0.01309  0.00233 -0.04142  0.11104
##  [9] -0.01519 -0.17034 -0.01641  0.15964  0.00557 -0.06217  0.04131  0.02157
## [17]  0.01226 -0.00753  0.03038 -0.00783

The values which were estimated via the NIPALS
algorithm:

X.impute[id.na] # show imputted values
##  [1]  0.0837747419 -0.0190061068  0.0004024897 -0.0180879247 -0.0094185656
##  [6] -0.0312362158 -0.0706920015  0.1400817774  0.0083359545 -0.1158255139
## [11]  0.0164817649  0.1007897385  0.0236184385  0.0191934144  0.0214240977
## [16]  0.0686280312 -0.0039198425  0.0085870558  0.0450234407  0.0013964758



mixOmics 6.30.0 on Bioconductor

At the end of October 2024 Bioconductor updated to version 3.20, and with it updated to the latest version of mixOmics 6.30.0. You can install the latest version of mixOmics on Bioconductor here. This latest release version of the package runs on R version 4.4 and includes some minor bug fixes and updated code and unit tests. See our Github page for more details on these updates.

Webinar: PLS methods

This webinar was presented for a seminar to a group of quantitative researchers (mostly statisticians) at the University of Melbourne. Abstract is below.

Topics covered: context of data integration, PCA solved with NIPALS algorithm and SVD, sparse PCA, correlation circle plot interpretation, PLS algorithms and deflation modes, sparse PLS.

Technological improvements have allowed for the collection of data from different types of molecules (e.g. genes, proteins, metabolites, microorganisms) resulting in multiple ‘omics data (e.g. transcriptomics, proteomics, metabolomics, microbiome) measured from the same set N of biospecimens or individuals. In this talk I will introduce the statistical integration of these multi-omics data to shed more light into a biological system.

Integrating data include numerous challenges – data are complex and large, each with few samples (N < 50) and many molecules (P > 10,000), and generated using different technologies. I will present PLS (Partial Least Squares / Projection to Latent Structures developed by Wold in the 1980s) as an algorithm of choice for data integration of small N large P problems. These variants form the basis of our comprehensive mixOmics R package for feature selection, dimension reduction and integration of omics data sets. This talk is targeted at a general audience with background knowledge in statistics and interest in large data

The webinar was re-recorded for the PLS section.

Webinar: Time-course multi-omics integration

I presented this talk for a group of statisticians at the Australian National University in Canberra. The abstract is below.

Topics covered: linear mixed model splines, multi-omics integration (PLS multiblock), correlation circle plot interpretation, timeOmics.

Longitudinal experiments are becoming increasingly popular in omics studies to monitor molecular changes following treatment or during disease progression. Integrating these data sets can give us some mechanistic insights into the different types of omics layers.

However, longitudinal omics data present numerous challenges including a small number of time points that may be unevenly spaced and unmatched between different data types, a small number of individuals, and a high individual variability. While current approaches have focused on differential expression across time or time profile clustering, the modelling of omics time profiles in a multivariate manner is critically lacking to understand longitudinal biological interactions.

I will present a statistical framework, timeOmics, to identify correlated profiles over time and between omics (transcriptomics, metabolomics, microbiome) to give insights into the molecular dynamics of biological systems and discuss future avenues of research in this expanding area.

Some key references

The timeOmics package

timeOmics is currently not directly available from the mixOmics package, instead it is a separate R package hosted on Bioconductor. See the Bioconductor page for installation instructions.

[open] Self-paced online course Feb 24 – April 11, 2025

Single and multi-omics analysis and integration with mixOmics

This course is designed for:
  • Beginners looking for an introduction to mixOmics methods for single- and multi-omics analyses.
  • Current mixOmics users who want to deepen their understanding of the mixOmics methods.
  • Users who would like more guidance on analyzing their own data (we also provide exemplar datasets).

The workshop is self-paced and spans across 7 weeks. There are 4 Q&A live sessions, and many opportunities to interact with the cohort and your instructor Prof Kim-Anh Lê Cao via Slack. BYO data is encouraged: we provide advice so that you can analyse your own data with mixOmics tools as part of your learning process.  A good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) is essential to fully benefit from the course*. 

According to our past participants, a time commitment of 5-8h/week was sufficient to feel that they were progressing. Here is some feedback from a previous course.

We provide a certificate of attendance or completion.

Register here, places are limited!

Fees

Research Higher Degree students enrolled at a University: $495 AUD (incl. GST) [discount code: MIXO_RHD]

Staff and members from Universities & Not-for-profit organisations: $825 (incl. GST) [discount code: MIXO_NFP_STAFF]

Other industries: $1320 AUD (incl. GST)

discounts of 5% for a group of 3-9 learners and 10% for 10+ learners, however, this will require a single invoice per group.

These funds go towards the support of a software developer to maintain the package. If you need an invoice, contact Student Support at continuing-education[at]unimelb.edu.au

Teaching Period Dates
  • Teaching commences: Monday, 24 Feb 2025, 9:00 am AEST
    • Q&A live webinars are scheduled on Thursdays 6pm AEST / 8am CET during the first 4 weeks (27th Feb, 6th, 13th and 20th March).
    • An additional session might be added on Fridays 9am AEST ( = Thursdays 2pm PST / 5pm EST / 9pm CET)

  • Teaching concludes: Sunday, 23 March 2025, 11:59 pm AEST (after 4 weeks)
  • (non marked) Assessment due: Friday 4 April 2025 (2 weeks prep)
  • Peer-review of assessment due: Friday 11 April 2025 (1 week prep)

The course is divided into theory (50%) and hands-on practice, with the opportunity to analyse your own data. The exercises and assignments are in R. Participants are encouraged to use RStudio and Rmarkdown (template and R code provided).

*Need an R refresher?

Learners who are not proficient in R do not get the full benenefit of the course (based on their own, honest, feedback!) For those looking for an R refresher well ahead of the course:

Webinar: Φ-Space for continuous phenotyping of single-cell multi-omics data

We have developed a new PLS method for cell type continuous annotation of single cells, now in preprint!

  • Φ-Space addresses numerous challenges faced by state-of-the-art automated annotation methods:
    • to identify continuous and out-of-reference cell states,
    • to deal with batch effects in reference,
    • to utilise bulk references and multi-omic references.
  • Φ-Space uses soft classification to phenotype cells on a continuum. The continuous annotation, or phenotype space embedding is then used to reduce the dimensionality of the data for various downstream analyses.

Φ-Space: Continuous phenotyping of single-cell multi-omics data. Jiadong Mao, Yidi Deng, Kim-Anh Lê Cao. bioRxiv 2024. 

View this 52min video of Kim-Anh Lê Cao presenting Φ-Space at the WEHI Bioinformatics seminar:

Abstract

Single-cell multi-omics technologies have empowered increasingly refined characterisa- tion of the heterogeneity of cell populations. Automated cell type annotation methods have been developed to transfer cell type labels from well-annotated reference datasets to emerging query datasets. However, these methods suffer from some common caveats, including the failure to characterise transitional and novel cell states, sensitivity to batch effects and under-utilisation of phenotypic information other than cell types (e.g. sample source and disease conditions).

We developed Φ-Space, a computational framework for the continuous phenotyping of single-cell multi-omics data. In Φ-Space we adopt a highly versatile modelling strategy to continuously characterise query cell identity in a low-dimensional phenotype space, defined by reference phenotypes. The phenotype space embedding enables various downstream analyses, including insightful visualisations, clustering and cell type labelling.

We demonstrate through three case studies that Φ-Space (i) characterises develop- ing and out-of-reference cell states; (ii) is robust against batch effects in both reference and query; (iii) adapts to annotation tasks involving multiple omics types; (iv) over- comes technical differences between reference and query.

The Φ-Space package

Φ-Space is currently not directly available from the mixOmics package, instead it is a separate R package that can be installed from Github.

Webinar: PCA and PLS-DA

These two recordings were part of a presentation to WEHI for their postgraduate lecture series for a diverse audience.

In the PCA presentation (18 min), we explain the concept of linear combination of variables (components) and useful graphical outputs such as correlation circle plots and biplots.

In the PLS-DA presentation (7 min), we talk about the concept of multivariate signature.

If you want to know more about the actual algorithm under the hood, you can watch this webinar on PLS.