Clustered Image Maps (CIM) are a form of heatmap created to represent either:
mixOmics
.Hierarchical clustering is applied on the rows and columns of the real-valued similarity matrix simultaneously. This is represented as a 2D coloured grid where each cell’s colour is based on either the values of a single matrix (e.g. gene expression values) or the values of the similarity matrix when performing two dataset integration. Dendrograms are used along the axes to depict how each row/column clusters based on the hierarchical clustering method.
cim()
is a great complementary tool to the correlation
circle plot (plotVar()
) and relevance networks
(network()
). The correlation between subsets of variables
from each dataset can be observed.
library(mixOmics)
data(nutrimouse)
This parameter controls what association matrix is used for graphing
and is only relevant when dealing with integration methods (two
datasets). By default, it will use the combined association matrix
(mapping = "XY"
). This means that each cell in the heat
map represents the correlation between a feature from each dataset. If
mapping = "X"
or mapping = "Y"
is used, each
cell represents the raw expression data from the inputted
X or Y dataset respectively.
This parameter can also only be used in a integrative context. This
controls which latent components are used to determine the similarity
between the features. Using fewer components will reduce the
computational complexity (increasing time efficiency), but will also
reduce the “resolution” of the heatmap (as there are fewer components to
discriminate features). Figure 1 depicts the reduction in resolution.
This defaults to comp = 1:object$ncomp
, meaning that all
the components generated by the integrative method will be used.
# show CIM calculated on only the first dimension produced by PLS method
cim(pls.nutri, comp = 1)
# show CIM calculated on all dimensions from PLS method
cim(pls.nutri)
FIGURE 1: Clustered Image Maps from the PLS applied to the nutrimouse lipid and gene data. The left CIM is calculated on only one latent component while the right uses all components yielded by the PLS method.
When visualising high-dimensional datasets, sometimes the size of the
cells are so small interpretation is near impossible. Hence, a zoom tool
was implemented and is accessed through setting
zoom = TRUE
. Left clicking twice with the zoom tool sets
the top left and bottom right corners. This will open a new window with
a zoomed in version of this section to allow more fine resolution
assessment of the produced CIM.
This controls the distance measure used when clustering the rows and
columns. The distance measure is what determines which clusters are
combined as the hierarchical tree is being generated. The distance
measure influences the “similarity” of clusters. This defaults to
dist.method = "euclidean"
. Possible options include the
Pearson correlation ("correlation"
) and all those spported
by the R dist()
function (eg. "euclidean"
,
"manhattan"
, etc).
This controls the way in which the hierarchical tree is generated -
the agglomeration method. It accepts the values used in the R
hclust()
function (eg. "ward"
,
"single"
, etc). This defaults to
clust.method = "complete"
.
The resulting CIM can be exported to an external image file
(deposited within the working directory) using the save
parameter. A selection of file types is avaiable, including
‘jpeg’
, ‘tiff’
, ‘png’
and
‘pdf’
. Include a string with the name.save
parameter to save it properly. An example of this can be seen below.
# a file PLS_CIM_image.jpeg will be saved to the working directory
cim.res <- cim(pls.object, save = 'jpeg', name.save = 'PLS_CIM_image')
As mentioned above, CIMs can be used to visualise the clustering of features within the dataset in single omics methods. Each cell corresponds to the raw data from the original dataset. Figure 2 depicts this usage case.
This type of CIM provides useful information on how the features
and/or samples of the dataset are related. This can be seen in the
dendrograms along the top and left edges of the figure. Sections of the
heatmap which have fairly homogeneous colours (eg. the top right corner
in Figure 1) indicate that those features (i.e. C16.1n.7
,
C14.0
, C16.1n.9
, C18.1n.7
and
C18.1n.9
) are a primary factor in the clustering of those
corresponding samples (i.e. 21
, 36
,
31
and 32
).
spca.nutri <- spca(nutrimouse$lipid) # undergo the PLS method
# produce the CIM, labeling axes appropriately
cim(spca.nutri, xlab = "Lipids", ylab = "Samples")
FIGURE 2: Clustered Image Maps of the nutrimouse lipid data.
Note that both the below code chunks would produce the same CIM as
Figure 1. While the rows and columns would be in a different order, the
hierarchical tree would be the exact same as would each cell for a given
feature and sample. This is it due to the heatmap depicting the
raw data. More specifically, the cim()
function
calls the X
component of those spca
,
ipca
and pls
objects to generate the CIM. This
is equivalent to nutrimouse$lipid
in all these cases.
ipca.nutri <- ipca(nutrimouse$lipid) # undergo the IPCA method
cim(ipca.nutri) # produce the CIM
pls.nutri <- pls(nutrimouse$lipid, nutrimouse$gene) # undergo the PLS method
# produce the CIM of purely the lipid data used in the PLS method
cim(pls.nutri, mapping = "X")
When using the cim()
in an integrative context, the
resulting heatmap represents a different thing. Rather than colouring
cells based on the raw data, cell colours represent the correlation
between a given pair of features, one from each dataset.
This type of CIM represents the correlation structure extracted from the two datasets. The correlation of each original feature pair is determined by each of their correlation with the components from the integrative method (in Figure 3, this is PLS). Blocks of homogeneous colour depict subsets of features from each dataset which are correlated and suggests a potential causatory relationship.
# produce the CIM of the PLS method, labeling axes appropriately
cim(pls.nutri, xlab = "Genes", ylab = "Lipids")
FIGURE 3: Clustered Image Maps from the PLS applied to the nutrimouse lipid and gene data
A common error that is run into when using this function causes the following call in the Rstudio console:
Error in cim plot : figure margins too large
This is caused by the actual window being too small in Rstudio. It is advised to make Rstudio cover the whole screen and then adjust the margins of the various windows such that the size of the window where the ‘Plots’ is found is maximised. This should resolve this common error.
If this fails, the use of the X11()
function may also
resolve this. Refer to the save
and name.save
parameter section as well if this is still an issue.
The cimDiablo()
function is a clustered image map
specifically implemented to represent the multi-omics molecular
signature expression for each sample. It is very similar to a classical
hierarchical clustering. It contains virtually all of the same
parameters as the standard cim()
function.