Within the tidyverse
heatmaps can be generated via
ggplot2::geom_tile()
but is sometimes hard to reach the
versatility and beauty of a genuine heatmap function.
tidyheatmaps
provides a tidyverse-style interface to the
powerful heatmap package pheatmap by @raivokolde and enables the
generation of complex heatmaps from tidy data with minimal code.
tidyheatmap()
requires tidy data in long format, see tidyverse.
As an example we will use the gene expression data set
data_exprs
. In the tidyverse lingo the columns of a data
frame are called variables. The variable
expression
contains the numeric values to be color-coded in
the heatmap. Furthermore we will use the variables sample
for heatmap columns and external_gene_name
for heatmap
rows.
glimpse(data_exprs)
#> Rows: 800
#> Columns: 9
#> $ ensembl_gene_id <chr> "ENSMUSG00000033576", "ENSMUSG00000033576", "ENSMUS…
#> $ external_gene_name <chr> "Apol6", "Apol6", "Apol6", "Apol6", "Apol6", "Apol6…
#> $ sample <chr> "Hin_1", "Hin_2", "Hin_3", "Hin_4", "Hin_5", "Ein_1…
#> $ expression <dbl> 2.203755, 2.203755, 2.660558, 2.649534, 3.442740, 5…
#> $ group <chr> "Hin", "Hin", "Hin", "Hin", "Hin", "Ein", "Ein", "E…
#> $ sample_type <chr> "input", "input", "input", "input", "input", "input…
#> $ condition <chr> "healthy", "healthy", "healthy", "healthy", "health…
#> $ is_immune_gene <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no…
#> $ direction <chr> "up", "up", "up", "up", "up", "up", "up", "up", "up…
The basic layout of the heatmap relies on the parameters
rows
, columns
and values
. You can
think of them like aesthetics in ggplot2::ggplot()
, similar
to something like
aes(x = columns, y = rows, fill = values)
.
With the parameter scale
you can activate data scaling
for "row"
or "column"
. By default data scaling
is turned off scale = "none"
.
Rows and columns in the heatmap will appear in the same order as in
the tidy data frame used as input. For example, to order rows and
columns alphabetically, just use the dplyr::arrange()
.
You can customize the number of colors color_legend_n
in
the color legend. The default is 15.
tidyheatmap(data_exprs,
rows = external_gene_name,
columns = sample,
values = expression,
scale = "row",
color_legend_n = 5
)
You can also define the minimum and maximum values of the color
legend. Values smaller then the color_legend_min
will have
the lowest color, values bigger than the color_legend_max
will get the highest color.
tidyheatmap(data_exprs,
rows = external_gene_name,
columns = sample,
values = expression,
scale = "row",
color_legend_min = -1,
color_legend_max = 1
)
Of course, you can also change the colors
themselves.
The number of colors
you provide does not have to match
color_legend_n
. The color legend is automatically adjusted
to have color_legend_n
colors (the default is
15).
Annotations can be added for both rows
and
columns
via annotation_row
and
annotation_col
, respectively. Just specify the
corresponding variables in the tidy data frame. If you want more then
one variable for annotation just combine them by
c(var1, var2, var3)
.
tidyheatmap(data_exprs,
rows = external_gene_name,
columns = sample,
values = expression,
scale = "row",
annotation_col = c(sample_type, condition, group),
annotation_row = c(is_immune_gene, direction)
)
You can provide a list of named vectors to take control over the
annotations colors annotation_colors
.
ann_colors <- list(
condition = c(EAE = "#BD79B4", healthy = "#F5CEF2"),
group = c(Ein = "#C14236", Eip = "#E28946", Hin = "#4978AB", Hip = "#98BB85"),
sample_type = c(input = "#BDBDBD", IP = "#7D7D7D"),
direction = c(down = "#5071DC", up = "#C34B6B"),
is_immune_gene = c(yes = "#B69340", no = "#FFFFFF")
)
tidyheatmap(data_exprs,
rows = external_gene_name,
columns = sample,
values = expression,
scale = "row",
annotation_col = c(sample_type, condition, group),
annotation_row = c(is_immune_gene, direction),
annotation_colors = ann_colors
)
Gaps can be added by specifying data frame variables that should be
used to generate the gaps. Only one variable can be chosen for each
gaps_row
and gaps_col
.
You can provide absolute cell dimensions (in points) via the
cellwidth
and cellheight
parameters.
tidyheatmap(data_exprs,
rows = external_gene_name,
columns = sample,
values = expression,
scale = "row",
annotation_col = c(sample_type, condition, group),
annotation_row = c(is_immune_gene, direction),
annotation_colors = ann_colors,
gaps_row = direction,
gaps_col = group,
cellwidth = 7,
cellheight = 7
)
To highlight a selection of row or column labels while hiding the
rest, you can use show_selected_row_labels
or
show_selected_col_labels
, respectively.
tidyheatmap(data_exprs,
rows = external_gene_name,
columns = sample,
values = expression,
scale = "row",
annotation_col = c(sample_type, condition, group),
annotation_row = c(is_immune_gene, direction),
annotation_colors = ann_colors,
gaps_row = direction,
gaps_col = group,
cellwidth = 7,
cellheight = 7,
show_selected_row_labels = c("Apol6","Bsn","Vgf","Fam96b","Bag1","Aip"),
)
You can use the parameter filename
to write the heatmap
to file.
pheatmap
provides even more features like clustering, dendrograms, text within
cells, et cetera. Additional available parameters can be found
in the documentation of tidyheatmap()
.