Methods – mixOmics

Webinar: Φ-Space ST: a platform-agnostic method to identify cell states in spatial transcriptomics studies

We have a sequel to Φ-Space, Φ-Space ST developed by Dr Jiadong Mao for spatial transcriptomics studies! We are very excited about these new developments and the potential of Φ-Space for single cell annotation!

Φ-Space ST is:

A novel and fast approach for cell type composition analysis.
Platform-Agnostic and Scalable as it works across multiple spatial transcriptomics (ST) platforms, including CosMx, Visium, and Stereo-seq.
Accurate and integrative as it identifies cell states by leveraging multiple scRNA-seq references.
Segmentation-Free & Niche-Driven as it annotates cell states at subcellular resolution, uncovering niche-specific cell types and tumor-distinguishing patterns.

Φ-Space ST: a platform-agnostic method to identify cell states in spatial transcriptomics studies. Jiadong Mao, Jarny Choi, Kim-Anh Lê Cao. bioRxiv 2025.

Check Jiadong’s latest seminar he presented at Melbourne Integrative Genomics on Friday 14th February 2025:

Abstract

We introduce Φ-Space ST, a platform-agnostic method to identify continuous cell states in spatial transcriptomics (ST) data using multiple scRNA-seq references. For ST with supercellular resolution, Φ-Space ST achieves interpretable cell type deconvolution with significantly faster computation. For subcellular resolution, Φ-Space ST annotates cell states without cell segmentation, leading to highly insightful spatial niche identification. Φ-Space ST harmonises annotations derived from multiple scRNA-seq references, and provides interpretable characterisations of disease cell states by leveraging healthy references. We validate Φ-Space ST in three case studies involving CosMx, Visium and Stereo-seq platforms for various cancer tissues. Our method revealed niche-specific enriched cell types and distinct cell type co-presence patterns that distinguish tumour from non-tumour tissue regions. These findings highlight the potential of Φ-Space ST as a robust and scalable tool for ST data analysis for understanding complex tissues and pathologies.

Webinar: Φ-Space for continuous phenotyping of single-cell multi-omics data

We have developed a new PLS method for cell type continuous annotation of single cells, now in preprint!

Φ-Space addresses numerous challenges faced by state-of-the-art automated annotation methods:
- to identify continuous and out-of-reference cell states,
- to deal with batch effects in reference,
- to utilise bulk references and multi-omic references.
Φ-Space uses soft classification to phenotype cells on a continuum. The continuous annotation, or phenotype space embedding is then used to reduce the dimensionality of the data for various downstream analyses.

Φ-Space: Continuous phenotyping of single-cell multi-omics data. Jiadong Mao, Yidi Deng, Kim-Anh Lê Cao. bioRxiv 2024.

View this 52min video of Kim-Anh Lê Cao presenting Φ-Space at the WEHI Bioinformatics seminar:

Abstract.

Single-cell multi-omics technologies have empowered increasingly refined characterisa- tion of the heterogeneity of cell populations. Automated cell type annotation methods have been developed to transfer cell type labels from well-annotated reference datasets to emerging query datasets. However, these methods suffer from some common caveats, including the failure to characterise transitional and novel cell states, sensitivity to batch effects and under-utilisation of phenotypic information other than cell types (e.g. sample source and disease conditions).

We developed Φ-Space, a computational framework for the continuous phenotyping of single-cell multi-omics data. In Φ-Space we adopt a highly versatile modelling strategy to continuously characterise query cell identity in a low-dimensional phenotype space, defined by reference phenotypes. The phenotype space embedding enables various downstream analyses, including insightful visualisations, clustering and cell type labelling.

We demonstrate through three case studies that Φ-Space (i) characterises develop- ing and out-of-reference cell states; (ii) is robust against batch effects in both reference and query; (iii) adapts to annotation tasks involving multiple omics types; (iv) over- comes technical differences between reference and query.

The Φ-Space package

Φ-Space is currently not directly available from the mixOmics package, instead it is a separate R package that can be installed from Github.

Webinar: PCA and PLS-DA

These two recordings were part of a presentation to WEHI for their postgraduate lecture series for a diverse audience.

In the PCA presentation (18 min), we explain the concept of linear combination of variables (components) and useful graphical outputs such as correlation circle plots and biplots.

In the PLS-DA presentation (7 min), we talk about the concept of multivariate signature.

If you want to know more about the actual algorithm under the hood, you can watch this webinar on PLS.

General presentation about mixOmics

A new general presentation about mixOmics is available (and should be updated for major update of the package) in the [intlink id=”204″ type=”page”]Presentation Section[/intlink].

Lê Cao K.-A. Unravelling `omics’ data with the mixOmics R package, Illustration on several studies. General presentation on mixOmics (last updated 05/04/2012) [Presentation]

(s)IPCA

Independent Principal Component Analysis (IPCA)

In some case studies, we have identified some limitations when using PCA:

PCA assumes that gene expression follows a multivariate normal distribution and recent studies have demonstrated that microarray gene expression measurements follow instead a super-Gaussian distribution
PCA decomposes the data based on the maximization of its variance. In some cases, the biological question may not be related to the highest variance in the data

Instead, we propose to apply Independent Principal Component Analysis (IPCA) which combines the advantages of both PCA and Independent Component Analysis (ICA). It uses ICA as a denoising process of the loading vectors produced by PCA to better highlight the important biological entities and reveal insightful patterns in the data.

IPCA offers a better visualization of the data than ICA and with a smaller number of components than PCA.

How to choose the number of components:

The kurtosis measure is used to order the loading vectors to order the Independent Principal Components. We have shown that the kurtosis value is a good post hoc indicator of the number of components to choose, as a sudden drop in the values corresponds to irrelevant dimensions.

Sparse Independent Principal Component Analysis (sIPCA)

Similar to the [intlink id=”129″ type=”page”]sparse PCA[/intlink] version implemented in mixOmics, soft-thresholding is applied in the independent loading vectors in IPCA to perform internal variable selection.

How to choose the number of variables to select:

The number of variables to select is still an open issue. In our paper we proposed to use the Davies Bouldinmeasure which is an index of crisp cluster validity. This index compares the within-cluster scatter with the between-cluster separation.

More details about how to use the ipca.R function in the[intlink id=”233″ type=”page”] case study[/intlink].

References

Yao F., Coquery J., Lê Cao K.-A. (2012) Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets, BMC Bioinformatics 13:24.
Comon P: Independent component analysis, a new concept? Signal Process 1994, 36:287-314.
Hyvärinen A, Oja E: Indepedent Component Analysis: Algorithms and Applications. Neural Networks 2000, 13(4-5):411-430

New methods: multilevel analyses

A multilevel approach has been added for cross-over design experiments (up to two cross factors), in collaboration with A/Prof B. Liquet (Universite de Bordeaux, France). This approach takes into account the complex structure of repeated measurements from different assays, where different treatments are applied on the same subjects to highlight the treatment effects within subject separately from the biological variation between subject.

Two different frameworks are proposed:

a discriminant analysis (method = ‘splsda’) enables the selection of features separating the different treatments
a integrative analysis (method = ‘spls’) enables the interaction of two matched data sets and the selection of subset of correlated variables (positively or negatively) across the samples. The approach is unsupervised: no prior knowledge about the samples groups is included.

The multilevel function first decomposes the variance in the data sets X (and Y) and applies either sPLS-DA or sPLS on the within-subject deviation. One or two-factor analyses are available for sPLS-DA.

Associated functions include: multilevel.R, tune.multilevel.R, pheatmap.multilevel.R (see examples in methods, graphics and case studies).

This is our first step towards repeated measurements designs.

The package has been updated to version 4.0-1 to implement these methodologies. It now requires the library ‘pheatmap’.

Web-interface

R package and Methods: IPCA and sparse IPCA functions have been implemented (as well as their associated S3 functions). IPCA stands for Principal Component Analysis with Independent Loadings. It is a combination of the advantages of both PCA and Independent Component Analysis (ICA). PCA is a powerful exploratory tool if the biological question is related to the highest variance. ICA was recently proposed in the literature as an alternative to PCA as it optimizes an independence condition that can give more meaningful components. A preprint can be available upon request.
R package and Data: The Liver Toxicity study data has been updated to provide geneBank IDs and gene titles
R package and Data: Two other data sets have been added: Prostate Tumor study (gene expression) and Metabolomic study of Yeast (metabolomics).
Web interface: We are making good progress on our associated web-interface (now deployed on http://mixomics.qfab.org). Few illustrative examples are also available, and you can download the illustrative examples and run any type of analysis trough the interface. We are currently developing a ‘next level analysis’ to provide pathway enrichment analyses and give the functional annotation of the selected genes using the iHOP database. Do not hesitate to give us some feedback!
$webinterface$
‘sletter: we now have a newsletter, to subscribe, send an email to mixomics[at]math.univ-toulouse.fr with no subject in the body.

New Graphics: network & cim

New S3 method network and cim for results from PLS model
New code for the valid function to PLS-DA and SPLS-DA models validation
The S3 method plot.valid was modified to display graphical results from valid function for PLS-DA and SPLS-DA models
cim and network functions were modified to obtain the similarity matrix in return value
The S3 method plotVar was modified to obtain the coordinates for X and Y variables in return value
The predict function has been modified to simultaneously run either several or all prediction methods available to predict the classes of the test data from PLS-DA and SPLS-DA models

New Function: (s)PCA added

New function pca and spca are now available to perform Principal Component Analysis (PCA) and sparse PCA for variable selection
The S3 methods plotVar, plot3dVar, plotIndiv, plot3dIndiv were modified to generate graphical results for pca and spca

New function: plot.valid

New function plot.valid to display the results of the valid function
New code for imgCor function for a nicer representation of the correlation matrices
In predict function the argument 'method' were replaced by method = c("max.dist", "class.dist", "centroids.dist", "mahalanobis.dist")
The arguments dendrogram, ColSideColors and RowSideColors were added to the cim function
valid function can also been performed with missing values
Functions pls, plsda, spls and splsda were modified to identify zero- or near-zero variance predictors
The functions plotVar and plot3dVar were modified to represent only the X variables in the case of PLS-DA and SPLS-DA
The pca function has been improved so that the S3 methods plotIndiv, plot3dIndiv, plotVarand plot3dVar can be used with these new classe