plotVar() – Correlation Circle Plot
plotVar() function can be used to visualise and assess the correlation of each variable to a selected set of two latent components. The resulting components are used as the axes and the coordinates for each variable are determined by their correlation with the components.
This method can be used in single and multi omics contexts (works for rCCA, (s)PLS, (s)PLS-DA and (s)PCA). The variables are assumed to be centered and scaled. All the resulting vectors are plotted inside a unit circle (radius of 1). Each vector's position corresponds to its correlation with the components. Therefore, stronger associations will result in a vector that extends further away from the origin.
This plot can also be used to assess the correlation between each of the original variables. Variable vectors that have a close proximity to one another will be high correlated. Vectors with an acute angle (less than 90°) between them indicates they have a positive correlation. An obtuse angle (greater than 90°) indicates a negative correlation. If they are at a right angle, their correlation is zero. In other words, the correlation between two features is equal to the cosine of the angle between their vectors (that begin at the origin).
comp parameter controls which components are used as axes. This defaults as
comp = c(1,2) for 2D and
comp = c(1:3) for 3D. The first listed will be used as the horizontal axis with the second being used as the vertical axis.
There is a circle drawn at a radius of 1 by default (as all vectors should be no longer than a length of 1). In addition to this, an inner circle can be drawn to aid in visual assessment of correlations. This defaults to
rad.in = 0.5.
In high-dimensional contexts, the plot can become very messy (and unreadable) when every single variable is plotted. Any variables with a vector length less than the cutoff will not be shown. This defaults to
cutoff = 0, meaning all variables will be shown.
This controls which plotting package handles the generation of the plot. This defaults to
style = 'ggplot2'. Importantly, this parameter allows a 3-dimensional plot to be generated by setting
style = '3d'.
plotVar() in Single Omics
PCA exemplifies the simplest case of usage of this function. The vectors show how each variable is correlated (and is a indirect measure of its contribution) to each of the Principal Components. Note that in other methods, centered and scaling happens automatically. This is not the case with PCA and needs to be done using the relevant parameters. Figure 1 depicts the projection of the variables into the space of both the first two dimensions (components 1 and 2) as well as the first and third dimensions (components 1 and 3).
X <- nutrimouse$lipid # extract lipid concentration data # undergo the pca method scale.pca.lipid <- pca(X, ncomp = 3, center = TRUE, scale = TRUE) # plot the correlation circle plot from the PCA projected onto the # first two components plotVar(scale.pca.lipid, title = 'Nutrimouse lipid, Components 1 and 2') # plot the correlation circle plot from the PCA projected onto the # first and third components plotVar(scale.pca.lipid, comp = c(1,3), title = 'Nutrimouse lipid, Components 1 and 3')
FIGURE 1: Correlation circle plots from the PCA applied to nutrimouse lipid data. Correlation circle plots showing the correlation structure between lipids in the space spanned by PC1 and PC2, or the space spanned by PC1 and PC3
Figure 2 also shows how this looks using the 3-dimension functionality of this function. The sphere is the equivalent of the inner circle of the plots in Figure 1.
plotVar(scale.pca.lipid, style = '3d') # plot a basic 3D correlation sphere plot
FIGURE 2: Correlation sphere plots from the PCA applied to nutrimouse lipid data.
plotVar() in Multi Omics
When dealing with an integration framework, the
plotVar() functions very similarly. Variables from both datasets will be represented on the same set of axes. This allows for the assessment of correlations between variables from the different datasets. Figure 3 contains a basic call of the function when using the PLS method, as well as a second plot using the
cutoff parameter (to depict how this can improve clarity).
When plotting a given variable, its coordinates in the plot are determined by its projection onto the components from the dataset it was drawn from. This is also how the points are coloured, as can be seen in the legend.
Y <- nutrimouse$gene # extract the gene expression data pls.nutri <- pls(X, Y) # undergo the pls method # plot the correlation circle plot from the PLS without a cutoff plotVar(pls.nutri, title = "PLS Correlations - no cutoff") # plot the correlation circle plot from the PLS with a cutoff of 0.5 plotVar(pls.nutri, title = "PLS Correlations - cutoff of 0.5", cutoff = 0.5, legend = TRUE)
FIGURE 3: Correlation circle plots undergone on the features after the PLS method was used on the nutrimouse lipid and gene data. The only difference in the plots is the inclusion of the cutoff parameter, used to reduce visual clutter.
Refer to the following case studies for a more in depth look at generating and interpreting the output of the