lecao – mixOmics

New publication with multiple integration

Our paper ‘Novel Multivariate Methods for Integration of Genomics and Proteomics Data: Applications in a Kidney Transplant Rejection Study‘ has just been accepted in OMICS: a journal of integrative Biology, from a collaboration with scientists from the PRevention Of Organ Failure (PROOF), University of British Columbia.

It provides a nice case study with the application of PCA, IPCA, sPLS-DA and sGCCA (now implemented in mixOmics with the function wrapper.sgcca()).

Abstract

Multi-omics research is a key ingredient of data-intensive life sciences research, permitting measurement of biological molecules at different functional levels in the same individual. For a complete picture at the biological systems level, appropriate statistical techniques must however be developed to integrate different ‘omics’ data sets (e.g., genomics and proteomics). We report here multivariate projection-based analyses approaches to genomics and proteomics data sets, using the case study of and applications to observations in kidney transplant patients who experienced an acute rejection event (n = 20) versus non-rejecting controls (n = 20). In this data sets, we show how these novel methodologies might serve as promising tools for dimension reduction and selection of relevant features for different analytical frameworks. Unsupervised analyses highlighted the importance of post transplant time-of-rejection, while supervised analyses identified gene and protein signatures that together predicted rejection status with little time effect. The selected genes are part of biological pathways that are representative of immune responses. Gene enrichment profiles revealed increases in innate immune responses and neutrophil activities and a depletion of T lymphocyte related processes in rejection samples as compared to controls. In all, this article offers candidate biomarkers for future detection and monitoring of acute kidney transplant rejection, as well as ways forward for methodological advances to better harness multi-omics data sets.

6-7 October 2014, Toulouse, FR

We ran our second tutorial in Toulouse, hosted by the National Institute for Agricultural Research (INRA) and organised by the Plate-formes bio-statistique and bio-informatique GenoToul.

Sept 7, 2014, Strasbourg, FR

We ran our first mixOmics tutorial as part of the ECCB’14 conference in the beautiful city of Strasbourg. This one-day tutorial was a success and will be followed by other tutorials in 2014 and 2015.

We will run a two days tutorial in Toulouse, Oct 6-7 2014. Contact us for more information!

20140906_193554

perf() function tutorial

The old function valid() has been superseded by the perf() function.

The update of the website is on its way, in the meantime, please download the following file: Running_perf_function4.

mixOmics 5.0-2 update

The major changes of this new update is the perf() function that supersedes valid() and offers a variable stability measure across the different folds.

The pls() and spls() functions have been modified and are now following the same framework coding.

See the CRAN page here.

The mixOmics website will be updated shortly for the major changes of these functions. Remember that you can subscribe to our newsletter (mixOmics updates, workshops) as indicated here.

Changes in 5.0-2

New features:
————-
– The valid function has been superseded by the perf function. Although similar in essence, few bugs have been fixed to estimate the performance of the sPLS and sPLS-DA models with no selection bias. A variable stability frequency has been added to the output. Functions spls.model and pls.model have been removed.

Bug fixes:
———-
-pls and spls function have been modified and ‘harmonised’ w.r.t to scaling. Loading vectors a and b are now scaled to 1. Latent variables t and u are not scaled (following Table 21 of the Tenenhaus book – which is in French, sorry!).

-the argument abline.line has been set to FALSE by default in all plotIndiv functions.

-the warnings messages in the plot functions have been fixed

– tune.multilevel for one factor has been fixed.

ECCB’14 Tutorial on mixOmics

ECCB Tutorial T04. Multivariate methodologies for the exploration of large biological data sets. Application in R using the mixOmics package

Date: Sunday Sept 7, 2014

Venue: FORUM building, Faculté de Médecine, 4, rue Kirschleger, Strasbourg

Time: 9am – 5.30pm (registration starts from 8am)

Contact: mixomics[at]math.univ-toulouse.fr

More details: http://www.eccb14.org/program/tutorials/mixomics

How to register: http://www.eccb14.org/registration Registration includes lunch & coffee breaks on the day of the workshop and the tutorial material (.pdf and/or print). ECCB Tutorial rates: 110 € (academic) or 60 € (student)

Description

The objective of this tutorial is to introduce the fundamental concepts behind projection-based approaches and illustrate their application on some exemplar studies using the R package mixOmics.

Multivariate projection approaches are useful exploratory tools to get a first understanding of large and complex data sets. These approaches are extremely efficient on large data sets, and can also answer complex questions. Such approaches include Principal Component Analysis (PCA, Joliffe 2002) and other variants, Partial Least Squares regression (PLS, Wold 2001), PLS-Discriminant Analysis, Canonical Correlation Analysis (CCA, Hotelling 1936). These approaches enable the reduction of the dimension of the data by projecting them into a smaller subspace. Recent developments proposed the so-called `sparse’ approaches, which include Lasso penalisations to allow variable selection (Tibshirani 2001).

PCA is the oldest and most popular multivariate technique but often, little is known about how this approach is solved and what are the limitations. More sophisticated approaches like PLS and CCA have recently been extended to deal with the large dimension (sparse PLS, or regularized CCA) and were proven to bring biologically meaningful results in many studies. Contrary to PCA, PLS and CCA enable the integration of two types of data sets.

Since 2009, we have implemented many multivariate approaches and their sparse variants in the R package mixOmics to be used by the statistical and bioinformatics community. Full tutorials are given on our website: http://perso.math.univ-toulouse.fr/mixomics/

In this tutorial, we will focus on the application of these approaches to medium and high throughput biological data (transcriptomics, metabolomics, proteomics data) using PCA, CCA, PLS, PLS-DA and the variants that the mixOmics team and collaborators have developed.

Presenters

The presenters are all key developers of mixOmics.

-Dr Kim-Anh Lê Cao (The University of Queensland Dimantina Insitute, Brisbane, Australia). Kim-Anh is a biostatistician researcher in the University of Queensland, Brisbane, Australia. Her institute has a particular focus on severe and chronic diseases such as cancer and diseases involving the immune system, including arthritis, chronic infections, and diabetes. Together with the mixOmics team, Kim-Anh continues to develop methodologies to analyse complex biological studies.

-Dr Sébastien Déjean (Institut de Mathématiques de Toulouse, Université de Toulouse, France). Sebastien is a statistician research engineer in the Universite de Toulouse. Through his support activities to research, he contribues to various projects particularly in the fields of high throughput biology and information retrieval systems.

-Dr Ignacio González (Institut de Mathématiques de Toulouse, Université de Toulouse, Institut National de la Recherche Agronomique, France). Ignacio is working at the plateforme de bioinformatique et biostatistique de Toulouse. Ignacio has been working in several wet laboratories (INSERM, INRA, CNRS, INSA) where he provided statistical support. He has considerable experience in analyzing a vast range of biological data.

Target Audience

Postgraduate students, postdoctoral fellows and researchers with basic statistical knowledge, in need to

-explore large data sets

-use graphical techniques to better visualize data

– apply multivariate projection methodologies to large data sets.

Prerequisite and requirements

We expect the audience to have a good working knowledge in R. Attendees are requested to bring their own laptops, having installed the software RStudio http://www.rstudio.com/ and the R package mixOmics.

Version 4.1-3 is on CRAN now

Changes in 4.1
================

New features:
-------------
- New S3 method valid for objects of class psl, spls, plsda and splsda
- New select.var function to directly extract the selected variables from spls, spca, sipca
- New data set vac18 for multilevel data

Article published explaining correlation circle plots, relevance networks and CIM

Our manuscript ‘Insightful graphicalt outputs to explore relationships between two “omics” data sets has been published and explains how to interpret Correlation Circle plots, how relevance networks and CIM are generated from rCCA and sPLS.

Check this very colourful manuscript[intlink id=”202″ type=”page”]here[/intlink]!

Another presentation about mixOmics

Another general presentation of mixOmics dating Dec 2012, which presents some preliminary but exciting results about time course data and the generalisation of PLS to multi block data sets using the approach of our collaborator Arthur Tenenhaus and colleagues.

Go[intlink id=”202″ type=”page”]here[/intlink].

General presentation about mixOmics

A new general presentation about mixOmics is available (and should be updated for major update of the package) in the [intlink id=”204″ type=”page”]Presentation Section[/intlink].

Lê Cao K.-A. Unravelling `omics’ data with the mixOmics R package, Illustration on several studies. General presentation on mixOmics (last updated 05/04/2012) [Presentation]