library(mixOmics) # import the mixOmics library data(breast.TCGA) # extract the TCGA data X1 <- breast.TCGA$data.train$mirna # use the mirna and mrna expression levels as X2 <- breast.TCGA$data.train$mrna # the X datasets X <- list(mirna = X1, mrna = X2) Y <- breast.TCGA$data.train$protein # set the protein levels as the Y dataset
block.pls.result <- block.pls(X, Y, design = "full") # run the method plotIndiv(block.pls.result) # plot the samples plotVar(block.pls.result, legend = TRUE) # plot the variables
?block.pls can be run to determine all default arguments of this function:
- Number of components (
ncomp = 2): The first two PLS components are calculated,
- Design matrix (
design = "full"): The strength of all relationships between dataframes is maximised (= 1) – a “fully connected” design,
- PLS mode (
mode = regression): A PLS regression mode is performed,
- Scaling of the data (
scale = TRUE): Each block is standardised to zero means and unit variances.
# set the number of features to use for the X datasets list.keepX = list(mrna = rep(5, 2), mirna = rep(5,2)) # set the number of features to use for the Y dataset list.keepY = c(rep(10, 2)) block.spls.result <- block.spls(X, Y, design = "full", # run the method keepX = list.keepX, keepY = list.keepY) # plot the contributions of each feature to each dimension plotLoadings(block.pls.result, ncomp = 1) plotIndiv(block.pls.result) # plot the samples plotVar(block.pls.result, legend = TRUE) # plot the variables
?block.spls can be run to determine all default arguments of this function:
- Same defaults as above for
- Features to retain (
keepY): If unspecified, these values will default to using all features of the original dataframes.
Note, “multiblock” will be abbreviated to “MB” within this page.
Prior to learning the functionality of the MB-(s)PLS methods, a solid understanding of standard (s)PLS is strongly advised. The MB variants are extensions of the (s)PLS methods for when more than two datasets are being assessed. This draws on the methodology of Generalised CCA . Here, there are multiple predictor datasests (X~1~, … X~Q~) and a response vector/matrix of continuous values (y / Y). As with the standard forms of these methods, MB-sPLS is the sparse variant of MB-PLS and uses feature selection when forming latent components. When dealing with high dimensional datasets, MB-sPLS would be the recommended method as the non sparse version suffers from a lack of interpretability in this contexts.
MB-(s)PLS features the same four modes of operation as (s)PLS, including “regression”, “canonical”, “invariant” and “classic”. These each function the same way they do in (s)PLS. However, the
block.spls() functions do not have the same tuning and performance assessment methods when compared to
For a breakdown of how to construct the
design matrix, refer to the N-Integration Methods page.
indY vs Y
There are two ways in which the response dataset can be specified. If it is included in the list of datasets passed in via the
X parameter, then
indY passes the index of the desired Y dataframe to the function. In the below examples, the protein expression data is set to be the response dataframe:
X1 <- breast.TCGA$data.train$mirna X2 <- breast.TCGA$data.train$mrna X3 <- breast.TCGA$data.train$protein X <- list(mirna = X1, mrna = X2, protein = X3) block.pls.result <- block.pls(X, indY = 3)
The alternative method is to have the desired Y dataframe totally separate to all the X datasets:
X1 <- breast.TCGA$data.train$mirna X2 <- breast.TCGA$data.train$mrna X <- list(mirna = X1, mrna = X2) Y <- breast.TCGA$data.train$protein block.pls.result <- block.pls(X, Y = Y)