cim() – Clustered Image Maps
Clustered Image Maps (CIM) are a form of heatmap created to represent either:
- Expression values within a single dataset (Weinstein et al., 1994, 1997; Eisen et al., 1998). This usage is appropriate for (s)PCA, (s)IPCA and (s)PLS-DA contexts within
mixOmics
. - Pearson correlation coefficients between two matched datasets (Scherf et al., 2000). This usage is appropriate for (s)PLS and ®CCA contexts.
Hierarchical clustering is applied on the rows and columns of the real-valued similarity matrix simultaneously. This is represented as a 2D coloured grid where each cell’s colour is based on either the values of a single matrix (e.g. gene expression values) or the values of the similarity matrix when performing two dataset integration. Dendrograms are used along the axes to depict how each row/column clusters based on the hierarchical clustering method.
cim()
is a great complementary tool to the correlation circle plot (plotVar()
) and relevance networks (network()
). The correlation between subsets of variables from each dataset can be observed.
library(mixOmics)
data(nutrimouse)
cim() Parameters
mapping
This parameter controls what association matrix is used for graphing and is only relevant when dealing with integration methods (two datasets). By default, it will use the combined association matrix (mapping = "XY"
). This means that each cell in the heat map represents the correlation between a feature from each dataset. If mapping = "X"
or mapping = "Y"
is used, each cell represents the raw expression data from the inputted X or Y dataset respectively.
comp
This parameter can also only be used in a integrative context. This controls which latent components are used to determine the similarity between the features. Using fewer components will reduce the computational complexity (increasing time efficiency), but will also reduce the “resolution” of the heatmap (as there are fewer components to discriminate features). Figure 1 depicts the reduction in resolution. This defaults to comp = 1:object$ncomp
, meaning that all the components generated by the integrative method will be used.
# show CIM calculated on only the first dimension produced by PLS method
cim(pls.nutri, comp = 1)
# show CIM calculated on all dimensions from PLS method
cim(pls.nutri)
FIGURE 1: Clustered Image Maps from the PLS applied to the nutrimouse lipid and gene data. The left CIM is calculated on only one latent component while the right uses all components yielded by the PLS method.
zoom
When visualising high-dimensional datasets, sometimes the size of the cells are so small interpretation is near impossible. Hence, a zoom tool was implemented and is accessed through setting zoom = TRUE
. Left clicking twice with the zoom tool sets the top left and bottom right corners. This will open a new window with a zoomed in version of this section to allow more fine resolution assessment of the produced CIM.
dist.method
This controls the distance measure used when clustering the rows and columns. The distance measure is what determines which clusters are combined as the hierarchical tree is being generated. The distance measure influences the “similarity” of clusters. This defaults to dist.method = "euclidean"
. Possible options include the Pearson correlation ("correlation"
) and all those spported by the R dist()
function (eg. "euclidean"
, "manhattan"
, etc).
clust.method
This controls the way in which the hierarchical tree is generated – the agglomeration method. It accepts the values used in the R hclust()
function (eg. "ward"
, "single"
, etc). This defaults to clust.method = "complete"
.
save and name.save
The resulting CIM can be exported to an external image file (deposited within the working directory) using the save
parameter. A selection of file types is avaiable, including ‘jpeg’
, ‘tiff’
, ‘png’
and ‘pdf’
. Include a string with the name.save
parameter to save it properly. An example of this can be seen below.
# a file PLS_CIM_image.jpeg will be saved to the working directory
cim.res <- cim(pls.object, save = 'jpeg', name.save = 'PLS_CIM_image')
cim() in Single Omics
As mentioned above, CIMs can be used to visualise the clustering of features within the dataset in single omics methods. Each cell corresponds to the raw data from the original dataset. Figure 2 depicts this usage case.
This type of CIM provides useful information on how the features and/or samples of the dataset are related. This can be seen in the dendrograms along the top and left edges of the figure. Sections of the heatmap which have fairly homogeneous colours (eg. the top right corner in Figure 1) indicate that those features (i.e. C16.1n.7
, C14.0
, C16.1n.9
, C18.1n.7
and C18.1n.9
) are a primary factor in the clustering of those corresponding samples (i.e. 21
, 36
, 31
and 32
).
spca.nutri <- spca(nutrimouse$lipid) # undergo the PLS method
# produce the CIM, labeling axes appropriately
cim(spca.nutri, xlab = "Lipids", ylab = "Samples")
FIGURE 2: Clustered Image Maps of the nutrimouse lipid data.
Note that both the below code chunks would produce the same CIM as Figure 1. While the rows and columns would be in a different order, the hierarchical tree would be the exact same as would each cell for a given feature and sample. This is it due to the heatmap depicting the raw data. More specifically, the cim()
function calls the X
component of those spca
, ipca
and pls
objects to generate the CIM. This is equivalent to nutrimouse$lipid
in all these cases.
ipca.nutri <- ipca(nutrimouse$lipid) # undergo the IPCA method
cim(ipca.nutri) # produce the CIM
pls.nutri <- pls(nutrimouse$lipid, nutrimouse$gene) # undergo the PLS method
# produce the CIM of purely the lipid data used in the PLS method
cim(pls.nutri, mapping = "X")
cim() in Multi Omics
When using the cim()
in an integrative context, the resulting heatmap represents a different thing. Rather than colouring cells based on the raw data, cell colours represent the correlation between a given pair of features, one from each dataset.
This type of CIM represents the correlation structure extracted from the two datasets. The correlation of each original feature pair is determined by each of their correlation with the components from the integrative method (in Figure 3, this is PLS). Blocks of homogeneous colour depict subsets of features from each dataset which are correlated and suggests a potential causatory relationship.
# produce the CIM of the PLS method, labeling axes appropriately
cim(pls.nutri, xlab = "Genes", ylab = "Lipids")
FIGURE 3: Clustered Image Maps from the PLS applied to the nutrimouse lipid and gene data
Additional Notes
A Common Error
A common error that is run into when using this function causes the following call in the Rstudio console:
Error in cim plot : figure margins too large
This is caused by the actual window being too small in Rstudio. It is advised to make Rstudio cover the whole screen and then adjust the margins of the various windows such that the size of the window where the ‘Plots’ is found is maximised. This should resolve this common error.
If this fails, the use of the X11()
function may also resolve this. Refer to the save
and name.save
parameter section as well if this is still an issue.
cimDiablo() Variant
The cimDiablo()
function is a clustered image map specifically implemented to represent the multi-omics molecular signature expression for each sample. It is very similar to a classical hierarchical clustering. It contains virtually all of the same parameters as the standard cim()
function.
Case Studies
Refer to the following case studies for a more in depth look at generating and interpreting the output of the cim()
function:
- rCCA – Nutrimouse
- sPLS – Liver Toxicity
- sPLS-DA – SRBCT
- Multilevel – Vac18
- N-Integration – TCGA – Specifically the
cimDiablo()
function is utilised here. - P-Integration – Stem cells