Missing Values

Missing values

All methodologies implemented in mixOmics can handle missing values. In particular, (s)PLS, (s)PLS-DA, (s)PCA utilise the NIPALS (Non-linear Iterative Partial Least Squares) algorithm as part of their dimension reduction procedures. This algorithm is built to handle NAs [1].

This is implemented through the nipals() function within mixOmics. This function is called internally by the above methods but can also be used manually, as can be seen below.

Usage in mixOmics

library(mixOmics)
data(liver.toxicity)
X <- liver.toxicity$gene[, 1:100] # a reduced size data set

## pretend there are 20 NA values in our data
na.row <- sample(1:nrow(X), 20, replace = TRUE)
na.col <- sample(1:ncol(X), 20, replace = TRUE)
X.na <- as.matrix(X)

## fill these NA values in X
X.na[cbind(na.row, na.col)] <- NA
sum(is.na(X.na)) # number of cells with NA
## [1] 20
# this might take some time depending on the size of the data set
nipals.tune = nipals(X.na, ncomp = 10)$eig
barplot(nipals.tune, xlab = 'Principal component', ylab = 'Explained variance')

plot of chunk unnamed-chunk-1

FIGURE 1: Column graph of the explained variance of each Principal Component.

If missing values need to be imputed, the package contains impute.nipals() for this scenario. NIPALS is used to decompose the dataset. The resulting components, singular values and feature loadings can be used to reconstitute the original dataset, now with estimated values where the missing values were previously. To allow for the best estimation of missing values, there is a large number of components being used (ncom = 10).

X.impute <- impute.nipals(X = X.na, ncomp = 10)
sum(is.na(X.impute)) # number of cells with NA
## [1] 0

The difference between the imputed and real values can be checked. Here are the original values:

id.na = is.na(X.na) # determine position of NAs in dataframe

X[id.na] # show original values
##  [1]  0.02083 -0.00272 -0.06042  0.03707 -0.12342 -0.01260  0.04694 -0.03946  0.04829  0.06598 -0.02226
## [12] -0.00602  0.23731 -0.00964 -0.06130 -0.03306  0.07084  0.15803 -0.00102  0.08700

The values which were estimated via the NIPALS algorithm:

X.impute[id.na] # show imputted values
##  [1]  0.0299392709  0.0174895445 -0.0286264784 -0.0148384475 -0.0922254834  0.0005981184  0.0878194466
##  [8]  0.0075913694  0.0375374737 -0.0003962815  0.0075574168 -0.0109573397  0.1020136485 -0.2174275930
## [15] -0.0208898522 -0.0429488459  0.0221517728  0.1533233404 -0.0649749344  0.0666990064

References

  1. Wold, H. (1973). Nonlinear Iterative Partial Least Squares (NIPALS) Modelling: Some Current Developments. Multivariate Analysis–III, 383-407. https://doi.org/10.1016/b978-0-12-426653-7.50032-6