Microbial network inference for longitudinal microbiome studies with LUPINE

Our latest method based on PLS to infer microbial networks across time is now in preprint!

  • LUPINE is a PLS-based method that combines dimension reduction, and partial correlations to infer associations between taxa.
  • LUPINE takes into account information across time points
  • LUPINE has been designed for relatively small sample sizes and small number of time points

Microbial network inference for longitudinal microbiome studies with LUPINE. Saritha Kodikara, Kim-Anh Lê Cao. bioRxiv 2024.05.08.593086; 

View this 48min video of Dr Saritha Kodikara presenting LUPINE:

After the abstract, you will also find the most common questions related to LUPINE.

Abstract. The microbiome is a complex ecosystem of interdependent taxa that has traditionally been studied through cross-sectional studies. However, longitudinal microbiome studies are becoming increasingly popular. These studies enable researchers to infer taxa associations towards the understanding of coexistence, competition, and collaboration between microbes across time. Traditional metrics for association analysis, such as correlation, are limited due to the data characteristics of microbiome data (sparse, compositional, multivariate). Several network inference methods have been proposed, but have been largely unexplored in a longitudinal setting.

We introduce LUPINE (LongitUdinal modelling with Partial least squares regression for NEtwork inference), a novel approach that leverages on conditional independence and low-dimensional data representation. This method is specifically designed to handle scenarios with small sample sizes and small number of time points. LUPINE is the first method of its kind to infer microbial networks across time, while considering information from all past time points. We validate LUPINE and its variant, LUPINE_single (for single time point anlaysis) in simulated data and three case studies, where we highlight LUPINE’s ability to identify relevant taxa in each study context, across different experimental designs (mouse and human studies, with or without interventions, as short or long time courses). To detect changes in the networks across time, groups or in response to external disturbances, we used different metrics to compare the inferred networks.

LUPINE is a simple yet innovative network inference methodology that is suitable for, but not limited to, analysing longitudinal microbiome data. The R code and data are publicly available for readers interested in applying these new methods to their studies.

FAQ:

Q: Do you build up the network from the covariance matrix or from the inverse covariance matrix? And what are you doing linear regression on?

A: The network is built on the partial correlation so it would be similar to the inverse covariance matrix. But instead of estimating the inverse covariance matrix, we calculate partial correlations through linear regression. To estimate the partial correlation between taxa a and taxa b, we regress their counts on the low dimensional representation of other taxa (excluding taxa a and b). This is then repeated for all pairs (we have an efficient way to do this computationally).

Q: You’ve got the correlation structure between the taxa which you are trying to build up and you have also got the longitudinal correlation structure that you build into your simulation. How does that affect your computation of covariances?

A: We do not take the longitudinal correlation between into account at all in LUPINE, but this is something we are considering for our future research.

Q: You reduce the dimension of the data into one dimension. How much variance can be explained by the 1st component in your computation?

A: It depends on the data, but in the data we analysed, and if consider the single time point scheme only with PCA, the first component explained about 25% of the total variance. We could add more components into the regression but that may overfit the regression model. This is why we only select the first component, which explains much of the variance (for PCA, single time point) or covariance (for PLS, multiple time points).

Q: Do you think that this approach would work on single cell data trying to look at gene co expression in sort of longitudinal data in across time points?

A: It will not work with the present single cell technologies, because in LUPINE we need the same individuals/samples/cells across time to infer the association.

Q: When you do the linear regression, do you regress directly on the counts with all the zeros and the sparsity that you mentioned?

A: Yes, the method was originally developed for count data. We regress on the count data, but we also include library size as an offset to account for different library sizes. The method also works with center log ratio values, which I used to analyse the third case study.

Q: Do you apply your method for the two groups combined or separately?

A: I model each group separately as we assume that each group has a unique network.

Q: You’re building the networks building based on the partial correlations. What about the actual network for representation, do you actually binarize it?

A: Yes, I binarize the network based on a correlation test.

This entry was posted in Methods, News, Publications. Bookmark the permalink.