Page from R Markdown



Missing_Values.knit





All methodologies implemented in mixOmics can handle missing values.
In particular, (s)PLS, (s)PLS-DA,
(s)PCA utilise the NIPALS
(Non-linear Iterative
Partial Least
Squares) algorithm as part of their dimension reduction
procedures. This algorithm is built to handle NAs [1].

This is implemented through the nipals() function within
mixOmics. This function is called internally by the above methods but
can also be used manually, as can be seen below.

Usage in mixOmics

library(mixOmics)
data(liver.toxicity)
X <- liver.toxicity$gene[, 1:100] # a reduced size data set

## pretend there are 20 NA values in our data
na.row <- sample(1:nrow(X), 20, replace = TRUE)
na.col <- sample(1:ncol(X), 20, replace = TRUE)
X.na <- as.matrix(X)

## fill these NA values in X
X.na[cbind(na.row, na.col)] <- NA
sum(is.na(X.na)) # number of cells with NA
## [1] 20
# this might take some time depending on the size of the data set
nipals.tune = nipals(X.na, ncomp = 10)$eig
barplot(nipals.tune, xlab = 'Principal component', ylab = 'Explained variance')

FIGURE 1: Column graph of the explained variance of each Principal
Component.

If missing values need to be imputed, the package contains
impute.nipals() for this scenario. NIPALS
is used to decompose the dataset. The resulting components, singular
values and feature loadings can be used to reconstitute the original
dataset, now with estimated values where the missing values were
previously. To allow for the best estimation of missing values, there is
a large number of components being used (ncom = 10).

X.impute <- impute.nipals(X = X.na, ncomp = 10)
sum(is.na(X.impute)) # number of cells with NA
## [1] 0

The difference between the imputed and real values can be checked.
Here are the original values:

id.na = is.na(X.na) # determine position of NAs in dataframe

X[id.na] # show original values
##  [1]  0.09041 -0.04070  0.03497 -0.01712  0.01309  0.00233 -0.04142  0.11104
##  [9] -0.01519 -0.17034 -0.01641  0.15964  0.00557 -0.06217  0.04131  0.02157
## [17]  0.01226 -0.00753  0.03038 -0.00783

The values which were estimated via the NIPALS
algorithm:

X.impute[id.na] # show imputted values
##  [1]  0.0837747419 -0.0190061068  0.0004024897 -0.0180879247 -0.0094185656
##  [6] -0.0312362158 -0.0706920015  0.1400817774  0.0083359545 -0.1158255139
## [11]  0.0164817649  0.1007897385  0.0236184385  0.0191934144  0.0214240977
## [16]  0.0686280312 -0.0039198425  0.0085870558  0.0450234407  0.0013964758