This page provides a quick start guide for applying Partial Least Squares Discriminant Analysis (PLS-DA) and its sparse variant (sPLS-DA) using mixOmics. PLS-DA is the special case of PLS where the Y dataframe is a single, categorical variable (y). PLS-DA is used for classification by fitting a supervised model which discriminates sample groups. The variant sparse PLS-DA (sPLS-DA) includes lasso penalisation on the loading vectors to identify a subset of key variables.
🎥 Watch: Webinar on PLS-DA
Typical (s)PLS-DA-type questions:
– Can I discriminate samples based on their outcome category?
– Which variables discriminate the different outcomes?
– Can they constitute a molecular signature that predicts the class of external samples?
Data used on this page:srbct
Key functions used on this page:plsda()
splsda()
plotIndiv()
plotVar()
selectVar()
plotLoadings()
Related case studies:
Case Study: sPLS-DA SRBCT
References:
1. Pérez-Enciso, M. and Tenenhaus, M., 2003. Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach. Human genetics, 112(5-6), pp.581-592.
2. Nguyen, D.V. and Rocke, D.M., 2002. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18(1), pp.39-50.
3. Lê Cao, K.A., Boitard, S. and Besse, P., 2011. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC bioinformatics, 12(1), p.253
4. FISHER, R. (1936). THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS. Annals Of Eugenics, 7(2), 179-188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x