This webinar was presented for a seminar to a group of quantitative researchers (mostly statisticians) at the University of Melbourne. Abstract is below.
Topics covered: context of data integration, PCA solved with NIPALS algorithm and SVD, sparse PCA, correlation circle plot interpretation, PLS algorithms and deflation modes, sparse PLS.
Technological improvements have allowed for the collection of data from different types of molecules (e.g. genes, proteins, metabolites, microorganisms) resulting in multiple ‘omics data (e.g. transcriptomics, proteomics, metabolomics, microbiome) measured from the same set N of biospecimens or individuals. In this talk I will introduce the statistical integration of these multi-omics data to shed more light into a biological system.
Integrating data include numerous challenges – data are complex and large, each with few samples (N < 50) and many molecules (P > 10,000), and generated using different technologies. I will present PLS (Partial Least Squares / Projection to Latent Structures developed by Wold in the 1980s) as an algorithm of choice for data integration of small N large P problems. These variants form the basis of our comprehensive mixOmics R package for feature selection, dimension reduction and integration of omics data sets. This talk is targeted at a general audience with background knowledge in statistics and interest in large data
The webinar was re-recorded for the PLS section.