Our latest method based on PLS to infer microbial networks across time is now in preprint!
- LUPINE is a PLS-based method that combines dimension reduction, and partial correlations to infer associations between taxa.
- LUPINE takes into account information across time points
- LUPINE has been designed for relatively small sample sizes and small number of time points
Microbial network inference for longitudinal microbiome studies with LUPINE. Saritha Kodikara, Kim-Anh Lê Cao. bioRxiv 2024.05.08.593086;
View this 50min video of Dr Saritha Kodikara presenting her method LUPINE:
We also have a second video presented by Prof Kim-Anh Lê Cao who sets LUPINE in the context of microbiome longitudinal data analysis, elaborating more on the types of analytical objects covered in Kodikara et al. (2022) Statistical challenges in longitudinal microbiome data analysis, Briefings in Bioinformatics.
Below you will also find the most common questions related to LUPINE.
FAQ:
Q: Do you build up the network from the covariance matrix or from the inverse covariance matrix? And what are you doing linear regression on?
A: The network is built on the partial correlation so it would be similar to the inverse covariance matrix. But instead of estimating the inverse covariance matrix, we calculate partial correlations through linear regression. To estimate the partial correlation between taxa a and taxa b, we regress their counts on the low dimensional representation of other taxa (excluding taxa a and b). This is then repeated for all pairs (we have an efficient way to do this computationally).
Q: You reduce the dimension of the data into one dimension. How much variance can be explained by the 1st component in your computation?
A: It depends on the data, but in the data we analysed, and if consider the single time point scheme only with PCA, the first component explained about 25% of the total variance. We could add more components into the regression but that may overfit the regression model. This is why we only select the first component, which explains much of the variance (for PCA, single time point) or covariance (for PLS, multiple time points).
Q: Do you think that this approach would work on single cell data trying to look at gene co expression in sort of longitudinal data in across time points?
A: It will not work with the present single cell technologies, because in LUPINE we need the same individuals/samples/cells across time to infer the association.
Q: When you do the linear regression, do you regress directly on the counts with all the zeros and the sparsity that you mentioned?
A: Yes, the method was originally developed for count data. We regress on the count data, but we also include library size as an offset to account for different library sizes. The method also works with center log ratio values, which I used to analyse the third case study.
Q: Do you apply your method for the two groups combined or separately?
A: I model each group separately as we assume that each group has a unique network.
Q: You’re building the networks building based on the partial correlations. What about the actual network for representation, do you actually binarize it?
A: Yes, I binarize the network based on a correlation test.