(s)PLS – mixOmics

Projection to Latent Structures (PLS), also known as Partial Least Squares, is used to explore and explain the relationship between two datasets by calculating latent components that maximise covariance. In sparse PLS (sPLS), lasso penalisation is applied on the loading vectors to identify the most important variables. sPLS can be supervised (sPLS regression), where one dataset is used to predict or explain another, or unsupervised (sPLS canonical), where both datasets are treated equally. Additionally, PLS is categorised into PLS1 for univariate analysis, involving a single response variable, and PLS2 for multivariate analysis, where the response consists of multiple variables.

🎥 Watch: Webinar on PLS
🦴 Note: PLS is a flexible algorithm that can do different types of integration, it is the backbone of most mixOmics methods

Typical (s)PLS-type questions:
– Does the information from both datasets agree and reflect any biological condition of interest?
– If I consider Y as response data, can I model Y given the predictor variables X?
– What are the subsets of variables that are highly correlated and explain the major sources of variation across datasets?

Data used on this page:
liver.toxicity

Key functions used on this page:
pls()
spls()
plotIndiv()
plotVar()
selectVar()
plotLoadings()

Related case studies:
PLS1 framework: Extended Vignette
PLS2 framework: Case Study: sPLS Liver Toxicity

References:
1. Tenenhaus M. (1998) La régression PLS: théorie et pratique. Paris: Editions Technic.
2. Wold H. (1966) Estimation of principal components and related models by iterative least squares. In: Krishnaiah, P.R. (editors). Multivariate Analysis. Academic Press, N.Y., pp 391-420.
3. Wold, S., Sjöström, M., and Eriksson, L. (2001). Pls-regression: a basic tool of chemometrics. Chemometrics and intelligent laboratory systems, 58(2), 109–130.
4. Lê Cao K.-A., Martin P.G.P., Robert-Granié C. and Besse P. (2009) Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 10(34).
5. Lê Cao K.-A., Rossouw D., Robert-Granié C. and Besse P. (2008) A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.