fbpx

Multi-omics data integration: method and showcase applications

Lê Cao team and collaborators from University of British Columbia (Vancouver, Canada) have published their first method to integrate multiple omics data from the same set of biospecimens or individuals (e.g. transcriptomics, proteomics). Their method adopts a systems biology holistic approach by statistically integrating data from multiple biological compartments. Such approach provides improved biological insights compared with traditional single omics analyses, as it allows to take into account interactions between omics layers and extract multi-omics molecular networks.

DIABLO is a multivariate dimension reduction method and is hypothesis-free. The method constructs combinations of variables (e.g. cytokines, transcripts, proteins, metabolites) that are maximally correlated across data types to identify a minimal subset of markers – a multi-omics signature. This signature can highlight novel findings but is also the starting point to network modelling.

More information about DIABLO, implemented in the mixOmics R package: Amrit Singh, Casey P Shannon, Benoît Gautier, Florian Rohart, Michaël Vacher, Scott J Tebbutt and Kim-Anh Lê Cao (2019) DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assaysBioinformatics. You can also find some technical information in the mixOmics paper (particularly in the Supp!) and also in our tutorials here.

While the computational researchers where busy developing their method, they also analysed the data from the #SmallBig study (small sample, big data) with the EPIC (Expanded Program on Immunization) Consortium. EPIC comprises researchers from the Boston Children’s Hospital, University of British Columbia, Medical Research Council Unit The Gambia, Université libre de Bruxelles, Telethon Kids Institute and University of Western Australia, the Papua New Guinea Institute for Medical Research, to answer the question: What can less than 1mL of blood tell us about a newborn’s health?

Sample processing of the #SmallBig study (adapted from Lee et al. 2019)

In this study recently published in Nature Communications, the team has developed a technique to collect extremely small volumes of blood samples (< 1mL) to comprehensively characterise how biological molecules evolve in newborns. Using cutting-edge computational and statistical methods including DIABLO, they show that to the contrary to biology in adults that has a relatively steady-state, the first week of human life is highly dynamic and undergoes dramatic changes. Their results were consistently observed in vastly different areas of the world, West Africa (The Gambia) and Australasian (Papua New Guinea) and suggest a purposeful rather than random developmental path.

More information about the SmallBig study: Amy H. Lee, Casey P. Shannon, […]Tobias R. Kollmann (2019). Dynamic molecular changes during the first week of human life follow a robust developmental trajectory Nature Communications volume 10, Article number: 1092.

If you are interested in the potential of DIABLO to integrate microbiome and omics from the host, here is another study we published. We integrated the microbiome, proteome and meta-proteomics in T1D individuals.

Design of the multi-omics microbiome study

Identification of multi-omics signature from Gavin et al 2018.

More details about the study: Gavin PG, […], and Hamilton-Williams EE (2018). Intestinal metaproteomics reveals host-microbiota interactions in subjects at risk for type 1 diabetes Diabetes care 41: 10. We used DIABLO to integrate microbiome, proteomics and meta-proteomics.

New publication with multiple integration

Our paper ‘Novel Multivariate Methods for Integration of Genomics and Proteomics Data: Applications in a Kidney Transplant Rejection Study‘ has just been accepted in OMICS: a journal of integrative Biology, from a collaboration with scientists from the PRevention Of Organ Failure (PROOF), University of British Columbia.

It provides a nice case study with the application of PCA, IPCA, sPLS-DA and sGCCA (now implemented in mixOmics with the function wrapper.sgcca()).

Contact us for more details if needed.

Abstract

Multi-omics research is a key ingredient of data-intensive life sciences research, permitting measurement of biological molecules at different functional levels in the same individual. For a complete picture at the biological systems level, appropriate statistical techniques must however be developed to integrate different ‘omics’ data sets (e.g., genomics and proteomics). We report here multivariate projection-based analyses approaches to genomics and proteomics data sets, using the case study of and applications to observations in kidney transplant patients who experienced an acute rejection event (n = 20) versus non-rejecting controls (n = 20). In this data sets, we show how these novel methodologies might serve as promising tools for dimension reduction and selection of relevant features for different analytical frameworks. Unsupervised analyses highlighted the importance of post transplant time-of-rejection, while supervised analyses identified gene and protein signatures that together predicted rejection status with little time effect. The selected genes are part of biological pathways that are representative of immune responses. Gene enrichment profiles revealed increases in innate immune responses and neutrophil activities and a depletion of T lymphocyte related processes in rejection samples as compared to controls. In all, this article offers candidate biomarkers for future detection and monitoring of acute kidney transplant rejection, as well as ways forward for methodological advances to better harness multi-omics data sets.