Update on CRAN 5.1.1 Major changes

source: heyprints.comIn the last few months we have been busy with our major update. This is quite a major release with additional new features.

One major change that will impact all of us is the function plotIndiv. While we have new (sexy) functionalities, the argument ‘col‘ was swapped to ‘group‘. We will see if we can patch it back in the next release (in a month). In the meantime, give it a try, because it is worth the trouble!

We also fixed a convergence issue in the main sparse PLS algorithm. This may slightly affect your end feature selections as the algorithm is now converging properly.

We list the changes below, enjoy!


New features:
1 – plotContrib for objects of class PLSDA and sPLSDA has been added and is of particular interest for those analysing microbial communities / metagenomics data.

2 – wrapper.sgccda was added to enable multiple data sets integration with one or several factor outcomes. Note: the prediction function for this new add-on has not been fully tested yet and is not available.

3 – wrapper.sgcca and wrapper.sgccda now have an argument called ‘keep‘ that you can use as an alternative to the ‘penalty’ old argument. Keep is the equivalent of the keepX in the PLS method to specify the number of variables to select on each component and each block. Refer to the help file, as keep should be input as a list of length the number of blocks, and each element of the list (corresponding to a block) indicates the number of variables to select on each component (yes, it becomes, indeed, complicated).

4 – All wrapper methods for the multiblock module, i.e. wrapper.rgcca, wrapper.sgcca and wrapper.sgccda take the input argument ‘blocks‘ (instead of previously ‘data‘) – this is to enable a smoother transition to the next update!

5 – plotIndiv has been improved dramatically. A single function can now be used for the objects PLS, sPLS, PLS-DA, SPLS-DA, rCC, PCA, sPCA, IPCA, sIPCA, rGCCA, sGCCA, sGCCDA (not an S3 function anymore). In addition, we now provide the new arguments (and more to come!):
ellipse plots are now available, a group argument is requested for the unsupervised methods (PCA, IPCA, PLS)
– three types of graphical plot: graphics (version < 5.1-0), ggplot2 and lattice
legend and title can be added
NOTE: if you want to color each sample with respect to a factor (i.e. a factor of length n), then the argument to use is ‘group’. If you use a supervised approach then col.per.group is a vector of length the number of groups. These arguments may change in the coming up updates.

6 – cim has been implemented for PLS, sPLS, PLS-DA, SPLS-DA, rCC, PCA, sPCA, IPCA, sIPCA and includes a wide range of options to plot a single data set in the form of a heatmap (new!), or the cross correlation between two matching data sets via the methods rCC or (s)PLS using the cross product between latent variables and loading vectors (improved with legends and color bars). We will give more examples on our website.

7 – added package dependencies: ggplot2 and ellipse


Enhancements:
1 – All wrappers for multiple data integration have been improved and re-implemented. Consequently, the dependency to RGCCA has been removed, and three wrapper functions are now available: wrapper.sgcca, wrapper.rgcca and wrapper.sgccda (see New Feature #2 above).

2 – selectVar has been extended for the non sparse versions PCA, PLS and PLS-DA and output the features with decreasing absolute weights in the loading vectors. It is used in particular for plotContrib (see New feature #1 above)


Bug fixes:
1 – The sPLS algorithm was rewritten to ensure convergence. This implies that spls results might be slightly different from version < 5.1-0!

Aug 13-14 2015, Brisbane, AUS

Our first Brisbane workshop entitled ‘mixOmics: exploration and integration of ‘omics data‘ is taking place at the Translational Research Institute in sunny Brisbane, where the Lê Cao team is based. The workshop is sponsored by AGTA small grant scheme, QIAGEN and our institutes the University of Queensland Diamantina Institute (UQDI) and the Translational Research Institute (TRI).

The workshop will be mostly targeting postgraduate students, postdocs and researchers working in biology, bioinformatics, computational biology, with a basic knowledge in statistics and good R programming skills.

Dates: 13-14 August 2015

Time: 9am – 5pm

Venue: SparQ-eD Room 2011, level 2, Translational Research Institute, 37 Kent st, Woolloonggabba, Brisbane.

The case studies were very helpful in understanding the type of data each test can be used with.’

Loved it. Very well presented, course material very clear and will be a useful resource in the future. Thanks Kim-Anh!!

‘Very nicely and clearly explained. Particularly appreciated that we were given the R scripts, so that we could concentrate on concepts rather than scripting.

20150813_161837
‘Everyone very helpful. Thanks Benoit!! You are a legend!’
20150813_155407
‘Very well designed such that the concepts were presented and explained while the trickiness of the R code was helped enormously by being able to follow the downloaded scripts. The tutors were very helpful in interpreting what the code was doing.’
20150813_155455
‘I really enjoyed the workshop – I found it very informative and the analogies used to help explain some of the more difficult concepts were very helpful.’

mixomics seminar poster final

DiamantinaBrandColour      AGTA-DESIGN-horz-URL       QLogo_50mm_RGB_regist

mixOmics 5.0-4 on CRAN

Dear mixOmics users,

We have submitted an updated version to the CRAN. The changes are listed below. Few points in particular to keep in mind:

  • select.var() was renamed selectVar() (clash with our dependency to the package MASS)
  • we borrowed the function tau.estim() to the RGCCA package in order to estimate the regularisation parameters from the rCCA – a way to bypass tune.rcc() with large matrices
  • the multilevel module has been updated, with some changes in the call of the function and a new function called withinVariation() (see details on the website https://mixomics.org/methods/multilevel/)

We thank you all for your interest in the package. There are important upcoming developments so please keep in touch via the website.

 

Changes in 5.0-4
================

New features:
————-
1- new set of palettes have been added: color.jet, color.spectral, color.GreenRed and color.mixo
2- the multilevel module has been updated. A new function called withinVariation() calculates the within matrix. Our new website www.mixOmics.org will be updated shortly
3- the function tau.estim was borrowed from the RGCCA package and included in mixOmics in order to estimate the regularisation parameters from rcc more efficiently than tune.rcc(). We noted differences in those parameters estimates between tune.rcc() and tau.estim() as the methods use either cross-validation or the formula from Shaefer and Strimmer (2005). When using tau.estim() we also advise to center and scale the input data in rcc(). See help tau.estim().
4- because of a S3 method clash with the MASS package with the current R version we had to rename select.var to selectVar

Bug fixes:
———-
1- select.var.sgcca has been fixed (the outputs were messy)
2- minor bug in plotVar.sgcca and plotVar.rgcca fixed
3- the algorithm in perf.pls and perf.spls has been almost entirely changed. We are now using a different algorithm to estimate the Q2, as presented in the help Rd file (unfortunately the reference is in French so contact us for more details if needed). plot.perf() has been updated

Enhancements:
———-
1- network default color set to color.GreenRed
2- output feature.final in perf S3 function has been removed. Better to use select.var() to obtain the list of selected variables
3- the multilevel module has been updated. The argument names were changed to ‘design’ instead of ‘cond’. The pheatmap.multilevel() function has been improved.
4- the nearZeroVar function that was borrowed from the caret package has been enhanced to improve computational time as this is costly in the pls/spls functions

 

April 9-10 2015, Auckland, NZ

We ran our third workshop at the University of Auckland, New Zealand. The event was organised by the Department of Statistics’ Statistical Consulting Centre. For more details go to the Statistical Consulting Centre’s Workshops web page.

There were 34 live and 6 online participants.

Coming up workshops: Brisbane (End of June 2015) and Paris (24-25 Sept 2015).

Our 34 participants in UoA

1O5A9561-small

‘It was very helpful, I learned a lot from it, Thank you very much Kim and your team !’

‘Excellent – very cutting-edge technology. Thanks!’

‘Good team. Excellent speaker, and the two assistant were very helpful’

‘It was great, thank you Kim-Anh, you explain very well and your slides are very clear’

Well done. I appreciate the effort going in to prepare the course material

The Oz mixOmics team in New Zealand

New publication with multiple integration

Our paper ‘Novel Multivariate Methods for Integration of Genomics and Proteomics Data: Applications in a Kidney Transplant Rejection Study‘ has just been accepted in OMICS: a journal of integrative Biology, from a collaboration with scientists from the PRevention Of Organ Failure (PROOF), University of British Columbia.

It provides a nice case study with the application of PCA, IPCA, sPLS-DA and sGCCA (now implemented in mixOmics with the function wrapper.sgcca()).

Contact us for more details if needed.

Abstract

Multi-omics research is a key ingredient of data-intensive life sciences research, permitting measurement of biological molecules at different functional levels in the same individual. For a complete picture at the biological systems level, appropriate statistical techniques must however be developed to integrate different ‘omics’ data sets (e.g., genomics and proteomics). We report here multivariate projection-based analyses approaches to genomics and proteomics data sets, using the case study of and applications to observations in kidney transplant patients who experienced an acute rejection event (n = 20) versus non-rejecting controls (n = 20). In this data sets, we show how these novel methodologies might serve as promising tools for dimension reduction and selection of relevant features for different analytical frameworks. Unsupervised analyses highlighted the importance of post transplant time-of-rejection, while supervised analyses identified gene and protein signatures that together predicted rejection status with little time effect. The selected genes are part of biological pathways that are representative of immune responses. Gene enrichment profiles revealed increases in innate immune responses and neutrophil activities and a depletion of T lymphocyte related processes in rejection samples as compared to controls. In all, this article offers candidate biomarkers for future detection and monitoring of acute kidney transplant rejection, as well as ways forward for methodological advances to better harness multi-omics data sets.

 

Sept 7, 2014, Strasbourg, FR

We ran our first mixOmics tutorial as part of the ECCB’14 conference in the beautiful city of Strasbourg. This one-day tutorial was a success and will be followed by other tutorials in 2014 and 2015.

We will run a two days tutorial in Toulouse, Oct 6-7 2014. Contact us for more information!

20140907_150404 20140907_143715 20140907_14365020140906_193554

 

mixOmics 5.0-2 update

The major changes of this new update is the perf() function that supersedes valid() and offers a variable stability measure across the different folds.

The pls() and spls() functions have been modified and are now following the same framework coding.

See the CRAN page here.

The mixOmics website will be updated shortly for the major changes of these functions. Remember that you can subscribe to our newsletter (mixOmics updates, workshops) as indicated here.

Changes in 5.0-2

 

New features:
————-
– The valid function has been superseded by the perf function. Although similar in essence, few bugs have been fixed to estimate the performance of the sPLS and sPLS-DA models with no selection bias. A variable stability frequency has been added to the output. Functions spls.model and pls.model have been removed.

Bug fixes:
———-
-pls and spls function have been modified and ‘harmonised’ w.r.t to scaling. Loading vectors a and b are now scaled to 1. Latent variables t and u are not scaled (following Table 21 of the Tenenhaus book – which is in French, sorry!).

-the argument abline.line has been set to FALSE by default in all plotIndiv functions.

-the warnings messages in the plot functions have been fixed

– tune.multilevel for one factor has been fixed.

ECCB’14 Tutorial on mixOmics

ECCB Tutorial T04. Multivariate methodologies for the exploration of large biological data sets. Application in R using the mixOmics package

Date: Sunday Sept 7, 2014
Venue: FORUM building, Faculté de Médecine, 4, rue Kirschleger, Strasbourg
Time: 9am – 5.30pm (registration starts from 8am)
Contact: mixomics[at]math.univ-toulouse.fr
More details: http://www.eccb14.org/program/tutorials/mixomics

 How to register: http://www.eccb14.org/registration Registration includes lunch & coffee breaks on the day of the workshop and the tutorial material (.pdf and/or print). ECCB Tutorial rates: 110 € (academic) or 60 € (student)

Description

The objective of this tutorial is to introduce the fundamental concepts behind projection-based approaches and illustrate their application on some exemplar studies using the R package mixOmics.

Multivariate projection approaches are useful exploratory tools to get a first understanding of large and complex data sets. These approaches are extremely efficient on large data sets, and can also answer complex questions. Such approaches include Principal Component Analysis (PCA, Joliffe 2002) and other variants, Partial Least Squares regression (PLS, Wold 2001), PLS-Discriminant Analysis, Canonical Correlation Analysis (CCA, Hotelling 1936). These approaches enable the reduction of the dimension of the data by projecting them into a smaller subspace. Recent developments proposed the so-called `sparse’ approaches, which include Lasso penalisations to allow variable selection (Tibshirani 2001).

PCA is the oldest and most popular multivariate technique but often, little is known about how this approach is solved and what are the limitations. More sophisticated approaches like PLS and CCA have recently been extended to deal with the large dimension (sparse PLS, or regularized CCA) and were proven to bring biologically meaningful results in many studies. Contrary to PCA, PLS and CCA enable the integration of two types of data sets.

Since 2009, we have implemented many multivariate approaches and their sparse variants in the R package mixOmics to be used by the statistical and bioinformatics community. Full tutorials are given on our website: http://perso.math.univ-toulouse.fr/mixomics/

In this tutorial, we will focus on the application of these approaches to medium and high throughput biological data (transcriptomics, metabolomics, proteomics data) using PCA, CCA, PLS, PLS-DA and the variants that the mixOmics team and collaborators have developed.

 

Presenters

The presenters are all key developers of mixOmics.

-Dr Kim-Anh Lê Cao (The University of Queensland Dimantina Insitute, Brisbane, Australia). Kim-Anh is a biostatistician researcher in the University of Queensland, Brisbane, Australia. Her institute has a particular focus on severe and chronic diseases such as cancer and diseases involving the immune system, including arthritis, chronic infections, and diabetes. Together with the mixOmics team, Kim-Anh continues to develop methodologies to analyse complex biological studies.

-Dr Sébastien Déjean (Institut de Mathématiques de Toulouse, Université de Toulouse, France). Sebastien is a statistician research engineer in the Universite de Toulouse. Through his support activities to research, he contribues to various projects particularly in the fields of high throughput biology and information retrieval systems.

-Dr Ignacio González (Institut de Mathématiques de Toulouse, Université de Toulouse, Institut National de la Recherche Agronomique, France). Ignacio is working at the plateforme de bioinformatique et biostatistique de Toulouse. Ignacio has been working in several wet laboratories (INSERM, INRA, CNRS, INSA) where he provided statistical support. He has considerable experience in analyzing a vast range of biological data.

Target Audience

Postgraduate students, postdoctoral fellows and researchers with basic statistical knowledge, in need to

-explore large data sets

-use graphical techniques to better visualize data

– apply multivariate projection methodologies to large data sets.

Prerequisite and requirements

We expect the audience to have a good working knowledge in R. Attendees are requested to bring their own laptops, having installed the software RStudio http://www.rstudio.com/ and the R package mixOmics.