[Closed] Self-paced online course Oct 31st – Nov 27 2022

The next iteration of the course will be in September 2023 for a likely duration of 6-8 weeks (it will be advertised 3 months before opening the course). This course is online, but at your own pace, meaning that you need to dedicate enough time (5-8h per week) to fully benefit from the program.

Feedback from the 2022 iteration:

  • You can do it at your own time since the resources provided (Webinars and reading material) are very helpful. Due to working hours I had to watch/read on demand (at my own time)
  • Kim-Anh has done a very good job in the webinars and was generally approachable and helpful. Thank you! The online course material was very good and explained the basics of the program quite well. The integration with the mixOmics online material and sample cases is very helpful.
  • It had the option to attend live webinars (two offered times) or watch recordings. – The possibility to ask questions was available for both live webinars and stack. – The assignments are designed to enhance further learning allowing to use of either own data or provided data at different challenge skills.
  • Course organisers were very responsive to our questions in Slack. Modules flowed nicely and were well organised. Webinars were useful.

This is our second round of online course ‘mixOmics R Essentials for Biological Data Integration‘ that includes 4 weeks of asynchronous learning (with one live summary + Q&A per week), numerous chats on Slack and an additional 3 weeks to complete the assignment. Some feedback from our last round can be found here. Our last survey seem to suggest most learners spent between 5-8h per week on the program.

  • Teaching Period Dates, asynchronised:
    • Start – Monday, 31st October 2022
    • End – Sunday, 27th November 2022
    • (non marked) Assessment due Sunday, 9th December 2022
    • Peer-review of assessment due Sunday, 16th December 2022
  • Fees vary for
    • Research Higher Degree students enrolled at a University: $495 AUD (incl. GST)
    • Staff and members from Universities & Not-for-profit organisations: $825 (incl. GST)
    • Other industries: $1320 AUD (incl. GST)
    • discounts of 5% for a group of 3-9 learners and 10% for 10+ learners, however, this will require a single invoice per group.

(these funds go towards the support of a software developer to maintain the package)

Information about the course and registration: https://study.unimelb.edu.au/find/short-courses/mixomics-r-essentials-for-biological-data-integration/

The number of places is limited, so first come first serve (we aim to run this course twice a year).

What if I need an invoice? Contact Student Support at continuing-education[at]unimelb.edu.au

Prerequisites. A good working knowledge in R programming (e.g. handling data frame, perform simple calculations and display simple graphical outputs) is essential to fully benefit from the course*. The course is divided into theory (50%) and hands-on practice, with the opportunity to analyse your own data. The exercises and assignments are in R. Participants are encouraged to use RStudio and Rmarkdown (template and R code provided).

*Learners who are not proficient in R do not get the full benefit of the course (based on their own, honest, feedback!)

Our book is out!

We are excited to announce that our book is out, along with several case studies and R scripts available online. Check out this page.

It’s been a very (very) long term project, and a great collaboration with Zoe Welham whose dedication and patience helped shape this project into a readable whole! A huge thank you to Al Abadi, who tirelessly helped updating the package as we developed the content.

Webinar: mixOmics in 50 minutes

This latest seminar was hosted by Australian BioCommons / EMBL-ABR / ARDC  in March 2024.

The latest version includes some recent updates (also covered in the other webinars in more details – check them out!)

The slides are opened to the community, but don’t forget to acknowledge the presenter if you are re-using the slides.

Multi-omics data (eg. transcriptomics, proteomics) collected from the same set of biospecimens or individuals is a powerful way to understand the underlying molecular mechanisms of a biological system. 

mixOmics, a popular R package, integrates omics data from a wide range of sources into a single, unified view making it easier to explore and reveal interactions between omics layers. It overcomes many of the challenges of multi-omic data integration arising from data that are complex and large, with few samples (<50) and many molecules (>10,000), and generated using different technologies. 

Prof Kim-Anh Lê Cao, head of the mixOmics team, is delivering this webinar to outline the different methods implemented in mixOmics and how statistical data integration is defined in this context. She will demonstrate how these approaches are applied to analysis of different multi-omics studies and outline the latest methodological developments in this area. From a study of human newborns, to multi-omics microbiomes, and multi-omics in single cells, these examples illustrate how mixOmics is used to perform variable selection and identify a signature of omics markers that characterise a specific phenotype or disease status. 

Who the webinar is for:This webinar is for life scientists, bioinformaticians and anyone with an interest in exploration and integration of multiomics biological datasets.

Topics covered: omics data statistical integration, introduction to matrix factorisation techniques, applications of DIABLO and MINT frameworks for bulk or single cell assays, extensions.

The slides are opened to the community, but don’t forget to acknowledge the presenter if you are re-using the slides.

Any mixOmics related question can be send to  https://mixomics-users.discourse.group (you will need to login but there is not mail traffic associated)

We are moving …. to bioC!

Dear all,

After 9 years hosted at the R CRAN we are migrating to bioconductor! It’s been a great first journey and we are grateful to the R CRAN for hosting our package. We are now ready for the next adventure.

Why are we moving?

  • It is our aspiration to empower computational and molecular biologists, which aligns with bioC vision.
  • We will be able to link with new experimentClass S4 objects and existing data packages using them in bioC, ranging from multi omics, microbiome and single cell.
  • We will be able to provide better vignettes and examples that will complement our website.

What has changed? What should I do? Should I panic?

So far we have allowed as little disruptions as possible, so the call of the functions and objects are the same. Gradually we will be adding more capabilities, which will grandly improve your usability (see above for the S4 class).

We are almost on bioC but the full acceptance is pending on the removal of mixOmics on the R CRAN. We fixed a few bugs, if you would like to install this new version:

The development version is now accessible on gitHub (feel free to fork / help* / comment on gitHub):

R>install_github("mixOmicsTeam/mixOmics")

Or alternatively, once we will be in bioConductor:

R> if (!requireNamespace("BiocManager"quietly = TRUE))  install.packages("BiocManager")
R> BiocManager::install("mixOmics", version = "3.8")

Then, business as usual!

* We would like to formally acknowledge the help of Lluís Revilla (Centre Esther Koplowitz, Barcelona) for helping us with setting up some testthat checks for our bioC version.

As we enter this new journey, we also thank you for this.
And also for this!

PS: a one-day microbiome workshop is scheduled in chilly Vancouver on November 6.

News 2018, workshops 2018 and DIABLO

Dear all,

The first few months of the year have been busy for us. Thanks to your support, we have been ranked second to the Bioinformatics Peer Prize (57 votes, so close after the winner with 59 votes!). Our entry is listed at this link if you would like to watch a basic introduction to the package.

For those who are new to mixOmics, I also cooked some prezi slides to introduce the broad context of where mixOmics sits, which was presented at the University of Melbourne ResBaz event in February.

We have now scheduled our 2018 workshops:

  • An advanced workshop focusing on omics data integration 7-8 June in the Parisian region. The registration will be in two stages: Expression of Interest due on April 29, followed by registration. The workshop will accommodate 30 participants. More details here. 
  • A 3-day beginner workshop 23-25 July at the University of Melbourne. More details will be populated very soon.

We have pushed the second version of our DIABLO manuscript on bioRxiv. The codes are currently on gitHub but they will also be rendered on our website soon.

For some little news, you can also follow us on Twitter @mixOmics_team.

 

Kim-Anh for the mixOmics team

A quick video introduction for mixOmics, vote for us!

Dear mixOmics friends, users, and adventurers,

We are reaching out to you to get your unbiased vote 😉 for the Bioinformatics PeerPrize III where we promote our latest publication in PLoS Computational Biology as a software article.

For those not familiar with the package, the little 3min video will give you a brief introduction to the topics of

  • `omics data integration in systems biology
  • multivariate dimension reduction techniques
  • mixOmics: what is it?
  • our main integrative methods DIABLO and MINT

This prize is a great opportunity for us to disseminate the toolkit. As you know, software development and obtaining resources to do so is not a piece of cake, but we managed, along the years. In 2017 the package was downloaded 29,000 times and is still going strong! thanks to your support and your invaluable feedback.

Vote for us if you like our entry! Votes closeon Feb 19. Thank you!

https://bioinformatics-peer-prize-iii.thinkable.org
(it will require the entry of your organisation and a ref of a paper where you were co-author on. They take this seriously!)

More news about what is coming up in 2018 for mixOmics very soon. We wish you many successful mixOmics analyses to you all for 2018!

6.3.1 on CRAN: bug fixes and latest news

We pushed 6.3.1 following a major bug in 6.3.0 when dealing with missing values (especially with DIABLO). Another bug related to the one-sided t-test in the tune functions.  All good now. Nipals is also faster to run.

A big thank to the users who give us feedback via our bitbucket issue list, this is very useful to us to continue improving the package.

The 3 workshops we ran in October and November 2017 were a success. The first Advanced workshop resulted in many stimulating discussions that will help the development team to move forward.  The two beginner workshops were also a lot of fun. We are particularly pleased to see how the small mixOmics community is growing!

Our paper has finally been published in Plos Computational Biology as a software article. The main methods are described in the poster below. We are now working on the long awaited DIABLO manuscript so that it leaves bioaRxiv and has its life of its own!

In the next few months these are the changes we are planning ahead:

  • a conversion to bioconductor. Ain’t no fear, it should not affect the function calls. We think it is now the right time to reach the bioconductor community, but that implies a fair amount of implementation on our side. Consequently the methods development will slow down in the coming few months.
  • a mixOmics forum to encourage discussions around the 19 methods we have now currently available.

Summary of the mixOmics article in PLoS Comp Biol

Version 6.3.0 and workshop

A new CRAN version is now available. We have considerably improved the computational time for the tune and perf functions! (see example below). We also fixed some reproducibility issues when using parallel computing with a set seed.

The update of the package will require new dependencies: ‘matrixStats’, ‘rARPACK’, ‘gridExtra’

There are still some spots left for the beginner mixOmics workshop in Toulouse, 9-10 Nov. Details here.

 

Enhancements:
————-
– huge gain in computation time for the tune functions tune.splsda and tune.block.splsda. The larger the data, the bigger the gain. Requires new dependencies: ‘matrixStats’, ‘rARPACK’, ‘gridExtra’
– a plot for an object `tune.block.splsda’
– tune.multilevel function was deprecated a while ago and now removed.

Bug fixes:
———-
– fixed reproducibility problem when using parallel coding in tune.block.splsda (via the `cpus’ argument)
– network: correlation with missing values fixed, label names fixed
– fixed perf for block.splsda objects with prediction distances
– some NA issues reported in 6.2.0 fixed (hopefully)

 

The gain in computational time is reported below for our different supervised frameworks. It all depends on your operating system, but generally, the user time  =  execution of the code, the system time = system processes (e.g opening and closing files), and the elapsed time is the difference in times since we started the stopwatch.

6.2.0, 2 postdoc positions and workshops

Dear mixOmics users,

Our new update 6.2.0 is now available on CRAN as part of our new version of our manuscript.

manuscript & package update:

The mixOmics manuscript introducing the supervised and integrative frameworks (PLS-DA, DIABLO block.plsda and MINT) has be updated, along with all the R / Sweave case studies, manuscript and codes are available at this link.  The case studies are also published on our website (sPLSDA:SRBCT, Case study: TCGA and Case study: MINT).

The manuscript describes in more details the difference prediction distances (see also the supplemental material) and the interpretation of the AUROC for our supervised methods.

The constraint argument was removed from all our methods, due to a risk of overfitting.

New features:

– The constraint argument (version 6.1.0 – 6.1.3) was removed in the functions perf and tune for all supervised objects because of a risk of overfitting

Enhancements:

– AUROC aded for MINT objects mint.plsda and mint.splsda where the study name needs to be specified, e.g. auroc( .., roc.study = “study4”). See ?auroc

– choice.ncomp output added on all perf and tune functions for all supervised methods.

– mat.c output for pls and plsda objects (matrix of coefficients from the regression of X / residual matrices X on the X-variates).

Bug fixes (thank you to the users who notified us on bitbucket):

– fixed bug when using predict, perf or tune with the error msg: ‘Error in predict.spls(spls.res, X.test[, nzv]) : ‘newdata’ must include all the variables of ‘object$X”

Workshops:

We advertised two workshops at this link. The advanced workshop 23-24 Oct 2017 is fully subscribed. This is our first MAW (mixOmics advanced workshop), but there will be more planned in 2018. We still have a few spots left for the classic workshop on the 9-10 Nov 2017 in Toulouse, contact us for more information (priority will be given to students and early career researchers).

Two senior postdoc positions (2 year and 3 year) still open!

The Australian mixOmics team now based at the University of Melbourne is recruiting two senior postdocs in the fields of computational biology or statistics, 1 full time 2-year position to work with the Stemformatics team on exciting omics integrating problems (‘omics and single cell omics) to improve stem cell classification, and 1 full time 3-year position for innovative multivariate methods developments for ‘omics time course, microbiome and P-integration. Contact us for more information.

Website update:

With the invaluable help from the bioinformatics masters students Danielle Davenport and Zoe Welham we are currently revamping the website to ensure all codes are running correctly. Thank you for those who sent us some feedback!

Update 6.1.3 on CRAN, postdoc position, manuscript and upcoming workshops

Dear mixOmics users,

We have been quiet for a while, but we have some good news! A CRAN update, a manuscript in bioRxiv, a 3-year postdoc position open to be part of the mixOmics core team, and three workshops planned for the French autumn!

The 6.1.3 update is now on the CRAN, we fixed a few bugs (see list below), and we also have a new plotIndiv argument ‘background‘ to visualise the prediction area for a PLS-DA and sPLS-DA model (max 2 components). This is a powerful plot to visualise the effect of the different prediction methods. Why does a prediction method matters for the performance of the discriminant analysis models? See elements of information below.

Example of prediction area plot for the SRBCT data with a PLS-DA model, see ?srbct

All you need is the background.predict function, and overlay the results with plotIndiv. For example:

data(liver.toxicity)
X = liver.toxicity$gene
Y = as.factor(liver.toxicity$treatment[, 4])
plsda.liver = plsda(X, Y, ncomp = 2)

# calculating background for the two first components, and the mahalanobis distance
background = background.predict(plsda.liver, comp.predicted = 2, dist = "mahalanobis.dist")

plotIndiv(plsda.liver, background = background, legend = TRUE)

We also added the new functions get.confusion_matrix and get.BER to calculate a confusion matrix based on class prediction of test samples and their real class, and calculate their Balanced Error Rate, see ?get.BER. Example of outputs (for a DIABLO analysis on the breast cancer TCGA multi omics study):

Example from our DIABLO pipeline available at https://mixomics.org/wp-content/uploads/2012/03/mixOmicsRscripts.zip

 

We have submitted a new version of our mixOmics manuscript to bioRxiv! The manuscript is available at this link and has been a top tweeted story in #bioinformatics. The manuscript mostly summarises the latest mixOmics frameworks for Discriminant Analysis (sPLS-DA, DIABLO and MINT) with extensive R and Sweave codes here, give it a go! The supplemental thoroughly details these methods. It almost sounds like an end of a first mixOmics era as Florian, our very talented and dedicated core developer, debugger and developer of MINT has moved on for another postdoctoral position at the University of Queensland, and Kim-Anh is starting her new group as a Senior Lecturer position at the University of Melbourne (UoM), at the Centre for Systems Genomics. Do not fear, this means there will be a new round of developments, notably in the microbiome and metagenomics field, as we are opening a new 3-year senior postdoctoral position in Computational Biostatistics at UoM (with opportunity to teach at the School of Mathematics and Statistics). More details at this link.

Seventeen multivariate methods currently implemented in mixOmics! Can you recognise your favourite?

Three workshops are coming up, between Sept – Nov 2017 in France. The first edition of MAW’17 is the advanced mixOmics workshop to introduce our new frameworks (published and in development: DIABLO, MINT, SNPOmics, timeOmics, mixMC and extension of integration) to our advanced users. The workshop is free, but you will need to cover your own travel and accommodation costs. Toulouse, 23-24 Oct 2017. Send us an email and we can send you the details. The two other workshops will be our normal beginner mixOmics workshops, in September (Lille) and in early November (Toulouse). More details on our website soon.

 

Other enhancements and bug fixes:

Enhancements:
————-
1 – perf.sgccda (for DIABLO) now implements a constraint model (see details in ?perf)
2 – legend = TRUE option in circosPlot and plotDiablo
Bug fixes:
———-
– tune.splsda had a bug when assessing the ‘choice.ncomp’ based on ones-sided t-test of the error rate when the error rate was constant.
– sparse PCA deflation algorithm fixed
– added add mixOmics:: for pls functions to avoid clash with other packages

 

Why does a prediction distance matter? (full story in our manuscript)

The supervised multivariate methods in mixOmics can be applied on an external test set to predict the outcome of new samples with the predict function (predict), or to assess the performance of the statistical model (perf). The predict function calculates prediction scores for each new sample, or predicted coordinates, which are equivalent to the latent component scores in the training set.

Prediction distances. Our supervised models work with dummy indicator matrices Y to indicate the class membership of each sample, and result in a prediction score for each outcome category k, k = 1, . . . , K. Therefore, the scores across all classes K need to be combined to obtain the final prediction of a given test sample using a prediction distance. We propose distances such as ‘maximum distance’, ‘Mahalanobis distance’ and ‘Centroids distance’, as detailed our supplemental information and in ?predict. Those distance can give different predictions, which will be assessed in the performance of the model.