mixOmics website update

We’re pleased to share that the mixOmics website has undergone a redesign to enhance your browsing experience and make it easier to access our resources.

What’s New?

  • Refreshed Design: A cleaner, more modern layout
  • 📚 Expanded Getting Started Pages: Helpful pages to help you get up and running with mixOmics
  • 🧭 Reorganized Navigation: A more intuitive menu to quickly find key resources
  • 🔗 Updated Social Links: Stay connected with the mixOmics community
  • 💬 Direct Links to the User Forum: If you haven’t already, join our mixOmics user forum to connect with over 500 other users and experts
  • 🧑‍💻 Updated About Pages: Learn more about the project and our team
  • 📅 Streamlined Workshops, Webinars, and News Sections: Easier access to events and updates
  • 🖥️ Embedded R Markdown Pages: Improved code presentation with syntax highlighting in our Methods, Plots, and Case Studies pages

We are continuing to make small improvements, so if you encounter any issues or have feedback, please feel free to contact us.

Thank you for your continued support of mixOmics.

The mixOmics Team

Page from R Markdown



Missing_Values.knit





All methodologies implemented in mixOmics can handle missing values.
In particular, (s)PLS, (s)PLS-DA,
(s)PCA utilise the NIPALS
(Non-linear Iterative
Partial Least
Squares) algorithm as part of their dimension reduction
procedures. This algorithm is built to handle NAs [1].

This is implemented through the nipals() function within
mixOmics. This function is called internally by the above methods but
can also be used manually, as can be seen below.

Usage in mixOmics

library(mixOmics)
data(liver.toxicity)
X <- liver.toxicity$gene[, 1:100] # a reduced size data set

## pretend there are 20 NA values in our data
na.row <- sample(1:nrow(X), 20, replace = TRUE)
na.col <- sample(1:ncol(X), 20, replace = TRUE)
X.na <- as.matrix(X)

## fill these NA values in X
X.na[cbind(na.row, na.col)] <- NA
sum(is.na(X.na)) # number of cells with NA
## [1] 20
# this might take some time depending on the size of the data set
nipals.tune = nipals(X.na, ncomp = 10)$eig
barplot(nipals.tune, xlab = 'Principal component', ylab = 'Explained variance')

FIGURE 1: Column graph of the explained variance of each Principal
Component.

If missing values need to be imputed, the package contains
impute.nipals() for this scenario. NIPALS
is used to decompose the dataset. The resulting components, singular
values and feature loadings can be used to reconstitute the original
dataset, now with estimated values where the missing values were
previously. To allow for the best estimation of missing values, there is
a large number of components being used (ncom = 10).

X.impute <- impute.nipals(X = X.na, ncomp = 10)
sum(is.na(X.impute)) # number of cells with NA
## [1] 0

The difference between the imputed and real values can be checked.
Here are the original values:

id.na = is.na(X.na) # determine position of NAs in dataframe

X[id.na] # show original values
##  [1]  0.09041 -0.04070  0.03497 -0.01712  0.01309  0.00233 -0.04142  0.11104
##  [9] -0.01519 -0.17034 -0.01641  0.15964  0.00557 -0.06217  0.04131  0.02157
## [17]  0.01226 -0.00753  0.03038 -0.00783

The values which were estimated via the NIPALS
algorithm:

X.impute[id.na] # show imputted values
##  [1]  0.0837747419 -0.0190061068  0.0004024897 -0.0180879247 -0.0094185656
##  [6] -0.0312362158 -0.0706920015  0.1400817774  0.0083359545 -0.1158255139
## [11]  0.0164817649  0.1007897385  0.0236184385  0.0191934144  0.0214240977
## [16]  0.0686280312 -0.0039198425  0.0085870558  0.0450234407  0.0013964758



mixOmics 6.30.0 on Bioconductor

At the end of October 2024 Bioconductor updated to version 3.20, and with it updated to the latest version of mixOmics 6.30.0. You can install the latest version of mixOmics on Bioconductor here. This latest release version of the package runs on R version 4.4 and includes some minor bug fixes and updated code and unit tests. See our Github page for more details on these updates.

Webinar: PLS methods

This webinar was presented for a seminar to a group of quantitative researchers (mostly statisticians) at the University of Melbourne. Abstract is below.

Topics covered: context of data integration, PCA solved with NIPALS algorithm and SVD, sparse PCA, correlation circle plot interpretation, PLS algorithms and deflation modes, sparse PLS.

Technological improvements have allowed for the collection of data from different types of molecules (e.g. genes, proteins, metabolites, microorganisms) resulting in multiple ‘omics data (e.g. transcriptomics, proteomics, metabolomics, microbiome) measured from the same set N of biospecimens or individuals. In this talk I will introduce the statistical integration of these multi-omics data to shed more light into a biological system.

Integrating data include numerous challenges – data are complex and large, each with few samples (N < 50) and many molecules (P > 10,000), and generated using different technologies. I will present PLS (Partial Least Squares / Projection to Latent Structures developed by Wold in the 1980s) as an algorithm of choice for data integration of small N large P problems. These variants form the basis of our comprehensive mixOmics R package for feature selection, dimension reduction and integration of omics data sets. This talk is targeted at a general audience with background knowledge in statistics and interest in large data

The webinar was re-recorded for the PLS section.