N-Integration Methods

The N-integration framework integrates several datasets measured on the same samples. It provides two methods: multiblock PLS for unsupervised or supervised regression analysis and multiblock PLS-DA (DIABLO) for supervised classification, both of which aim to identify correlations between datasets. These methods can also include sparse variants for feature selection (multiblock sPLS and multiblock sPLS-DA). This page describes the design matrix, which is used to define the relationships between the datasets, helping guide the analysis.

Typical N-integration-type questions:
– Does the information from all datasets agree and reflect any biological condition of interest?
– Can I discriminate samples across several datasets based on their outcome category?
– Which variables across the different omics datasets discriminate the different outcomes?
– Can they constitute a multi-omics signature thay predicts the class of unseen samples?

N-integration method pages:
Multiblock (s)PLS-DA (DIABLO) Method
Multiblock (s)PLS Method

Related case studies:
DIABLO TCGA Case Study
Multiblock sPLS Gastrulation Case Study

References:
1. Tenenhaus A and Tenenhaus M. Regularized generalized canonical correlation analysis. Psychometrika, 76(2):257–284, 2011.
2. Tenenhaus A, Philippe C, Guillemot V, Lê Cao, K. A., Grill J, and Frouin V. Variable selection for generalized canonical correlation analysis. Biostatistics, page kxu001, 2014.