Partial Least Squares (PLS), also known as Projection to Latent Structures, is used to explore and explain the relationship between two datasets by calculating latent components that maximise covariance. In sparse PLS (sPLS), lasso penalisation is applied on the loading vectors to identify the most important variables. sPLS can be used in a supervised manner (sPLS regression), where one dataset is used to predict or explain another, or unsupervised (sPLS canonical), where both datasets are treated equally. Additionally, PLS is categorized into PLS1 for univariate analysis, involving a single response variable, and PLS2 for multivariate analysis, where the response consists of multiple variables.
🎥 Watch: Webinar on PLS
Typical (s)PLS-type questions:
– Does the information from both datasets agree and reflect any biological condition of interest?
– If I consider Y as response data, can I model Y given the predictor variables X?
– What are the subsets of variables that are highly correlated and explain the major sources of variation across datasets?
Data used on this page:liver.toxicity
Key functions used on this page:pls()
spls()
plotIndiv()
plotVar()
selectVar()
plotLoadings()
Related case studies:
PLS1 framework: Extended Vignette
PLS2 framework: Case Study: sPLS Liver Toxicity
References:
1. Tenenhaus M. (1998) La régression PLS: théorie et pratique. Paris: Editions Technic.
2. Wold H. (1966) Estimation of principal components and related models by iterative least squares. In: Krishnaiah, P.R. (editors). Multivariate Analysis. Academic Press, N.Y., pp 391-420.
3. Wold, S., Sjöström, M., and Eriksson, L. (2001). Pls-regression: a basic tool of chemometrics. Chemometrics and intelligent laboratory systems, 58(2), 109–130.
4. Lê Cao K.-A., Martin P.G.P., Robert-Granié C. and Besse P. (2009) Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 10(34).
5. Lê Cao K.-A., Rossouw D., Robert-Granié C. and Besse P. (2008) A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.