Making heatmaps with R for microbiome analysis

plot of chunk annHeatmap2.3

Arianne Albert is the Biostatistician for the Women’s Health Research Institute at the British Columbia Women’s Hospital and Health Centre. She earned a PhD from the University of British Columbia under the tutelage of Dolph Schluter before branching off into health research. Arianne currently has her fingers in all kinds of research pies, including work on the vaginal microbiome. She still has a soft spot for sticklebacks though. You can find her on LinkedIn and Google Scholar.

Heatmaps are incredibly useful for the visual display of microarray data or data from high-trhoughput sequencing studies such as microbiome analysis. Basically, they are false colour images where cells in the matrix with high relative values are coloured differently from those with low relative values. Heatmaps can range from very simple blocks of colour with lists along 2 sides, or they can include information about hierarchical clustering, and/or values of other covariates of interest. Fortunately, R provides lots of options for constructing and annotating heatmaps.

R preliminaries

Load libraries

The first step is to make sure you’ve got the right libraries loaded. I find that the heatmap function in the basic stats package (loaded by default) is quite useful for many applications. However, more complicated annotations require either gplots or Heatplus (from Bioconductor).

library(gplots)  # for heatmap.2

# to install packages from Bioconductor:
source("http://bioconductor.org/biocLite.R")
biocLite("Heatplus")  # annHeatmap or annHeatmap2
library(Heatplus)

# load the vegan package for hierachical clustering if you want to use distance functions not specified in dist.
library(vegan)

# load the RColorBrewer package for better colour options
library(RColorBrewer) 

Get and Prepare Dataset

Next, we need some data! For this example, I’m going to use some microbiome data available for download from Dryad, originally from the paper:

Boucias DG, Cai Y, Sun Y, Lietze V, Sen R, Raychoudhury R, Scharf ME. 2013. The hindgut-lumen prokaryotic microbiota of the lignocellulose-degrading termite Reticulitermes flavipes and its responses to dietary lignocellulose composition. Molecular Ecology 22(7): 1836–1853. doi:10.1111/mec.12230; Dryad data package doi:10.5061/dryad.b922k.

This file is not formatted in a way that R will understand, so it requires some alteration before importing. The file has samples across the top and genera along the rows as well as some lovely, but superfluous, summary annotation. I got rid of extra rows and columns and transposed the rest in Excel. Now the 238 genera are along the top with the 12 samples as the rows.

Load your data! I’m assuuming you know how to do this in R …

all.data <- read.csv("genera example.csv")  # load the data
dim(all.data)
[1]  12 239

Here are the first 3 rows and 4 columns to give you an idea of what the data should look like:

all.data[1:3, 1:4]
    sample Acetivibrio Acetobacter Achromobacter 
1     S7           0           5             0
2     S8           1           0             1
3     S9           0           3             1

We’ll have to strip off the sample ids and convert them to row names so that the data matrix contains only sequence count data.

row.names(all.data) <- all.data$sample
all.data <- all.data[, -1]

Now, to make a heatmap with microbiome sequencing data, we ought to first transform the raw counts of reads to proportions within a sample:

data.prop <- all.data/rowSums(all.data)
data.prop[1:3, 1:3]
    Acetivibrio Acetobacter Achromobacter
S7   0.000e+00   0.0004289     0.000e+00
S8   9.625e-05   0.0000000     9.625e-05
S9   0.000e+00   0.0002759     9.195e-05

Make heatmaps

Generally, the default colours for heatmaps are pretty appalling. I personally like this palette, but play around to see what suits you.

# colorRampPalette is in the RColorBrewer package.  This creates a colour palette that shades from light yellow to red in RGB space with 100 unique colours
scaleyellowred <- colorRampPalette(c("lightyellow", "red"), space = "rgb")(100)

The most basic of heatmaps:

heatmap(as.matrix(data.prop), Rowv = NA, Colv = NA, col = scaleyellowred)

plot of chunk basic heatmap

It’s pretty clear that this plot is inadequate in many ways. For one, the genus labels are all squished along the bottom and impossible to read. One solution to this problem is to remove genera that are exceedingly rare from this figure. Let’s try removing genera whose relative read abundance is less than 1% of at least 1 sample.

# determine the maximum relative abundance for each column
maxab <- apply(data.prop, 2, max)
head(maxab)
Acetivibrio     Acetobacter   Achromobacter Acidaminococcus 
 1.088e-04       5.972e-04       5.441e-04       8.578e-05 

# remove the genera with less than 1% as their maximum relative abundance
n1 <- names(which(maxab < 0.01))
data.prop.1 <- data.prop[, -which(names(data.prop) %in% n1)]

This leaves us with 23 genera suggesting that most of the taxa sampled occur at very low relative abundances. The heatmap with the new data looks like this, with a bit of extra fiddling to get all of the labels displayed.

# the margins command sets the width of the white space around the plot. The first element is the bottom margin and the second is the right margin
heatmap(as.matrix(data.prop.1), Rowv = NA, Colv = NA, col = scaleyellowred, margins = c(10, 2))

plot of chunk basic heatmap 1 per

Much better! Now let’s add a dendrogram for the samples. The heatmap function will do this for you, but I prefer to make my own using the vegan package as it has more options for distance metrics. Also, this means that you can do hierarchical clustering using the full dataset, but only display the more abundant taxa in the heatmap.

# calculate the Bray-Curtis dissimilarity matrix on the full dataset:
data.dist <- vegdist(data.prop, method = "bray")

# Do average linkage hierarchical clustering. Other options are 'complete' or 'single'. You'll need to choose the one that best fits the needs of your situation and your data.
row.clus <- hclust(data.dist, "aver")

# make the heatmap with Rowv = as.dendrogram(row.clus)
heatmap(as.matrix(data.prop.1), Rowv = as.dendrogram(row.clus), Colv = NA, col = scaleyellowred, margins = c(10, 3))

plot of chunk make distance matrix and clusters

You can also add a column dendrogram to cluster the genera that occur more often together. Note that this one must be done on the same dataset that is used in the Heatmap (i.e. reduced number of genera).

# you have to transpose the dataset to get the genera as rows
data.dist.g <- vegdist(t(data.prop.1), method = "bray")
col.clus <- hclust(data.dist.g, "aver")

# make the heatmap with Rowv = as.dendrogram(row.clus)
heatmap(as.matrix(data.prop.1), Rowv = as.dendrogram(row.clus), Colv = as.dendrogram(col.clus), col = scaleyellowred, margins = c(10, 3))

plot of chunk make distance matrix and clusters for genera

For some applications, the above heatmaps would be sufficient, however, there are many times where it would be nice to indicate on the heatmap values for other important variables, such as experimental groups, or values of covariates. Also, it would be nice to have a legend for the colour scale. heatmap.2 in the gplots package can do both of these things.

First, let’s make up some annotation for our dataset (I would like to be 100% clear that this is totally fictional data and not in the original publication in any way):

# There are 12 samples in the dataset, so let's assign them randomly to one of two fictional categories
var1 <- round(runif(n = 12, min = 1, max = 2))  # this randomly samples from a uniform distribution and rounds the result to an integer value

# change the 1s and 2s to the names of colours:
var1 <- replace(var1, which(var1 == 1), "deepskyblue")
var1 <- replace(var1, which(var1 == 2), "magenta")

# if this was real data, you would need the order of the values to be in the same order as the sequencing data e.g.
cbind(row.names(data.prop), var1)

var1
 [1,] S7  magenta
 [2,] S8  magenta
 [3,] S9  deepskyblue
 [4,] S10 deepskyblue
 [5,] S11 deepskyblue
 [6,] S12 magenta
 [7,] S13 deepskyblue
 [8,] S14 deepskyblue
 [9,] S15 deepskyblue
[10,] S16 deepskyblue
[11,] S17 deepskyblue
[12,] S18 deepskyblue

Next, we can make a more complicated heatmap.

heatmap.2(as.matrix(data.prop.1),Rowv = as.dendrogram(row.clus), Colv = as.dendrogram(col.clus), col = scaleyellowred, RowSideColors = var1, # this puts in the annotation for the samples margins = c(10, 3)) 

plot of chunk heatmap with annotation

You’ll note that some of the defaults for heatmap.2 are kind of funny looking. The green tracing in each column can be turned off with trace = "none", and the histogram in the scale legend can be removed with density.info = "none". You can add plot labels (e.g. main, xlab and ylab), as well as change the relative sizing of the plot elements (lhei and lwid). There are lots of other options which are surprisingly well described in the help for heatmap.2.

heatmap.2(as.matrix(data.prop.1), Rowv = as.dendrogram(row.clus), Colv = as.dendrogram(col.clus), col = scaleyellowred, RowSideColors = var1, margins = c(11, 5), trace = "none", density.info = "none", xlab = "genera", ylab = "Samples", main = "Heatmap example", lhei = c(2, 8)) # this makes the colour-key legend a little thinner

plot of chunk heatmap with annotation2

Finally, the least intuitive function is annHeatmap2 from the Heatplus package. However, it allows for annotation with multiple variables at once, as well as an option for colouring the dendrogram into clusters. All of the options within the function are written as lists, which can take some getting used to if you don’t use them regularly.

# the annHeatmap2 function needs to be wrapped in the plot function in order to display the results
plot(annHeatmap2(as.matrix(data.prop.1),

# the colours have to be fiddled with a little more than with the other functions. With 50 breaks in the heatmap densities, there needs to be 51 colours in the scale palette. Error messages from the plotting should help to determine how many colours are needed for a given number of breaks
col = colorRampPalette(c("lightyellow", "red"), space = "rgb")(51), breaks = 50,

# dendrogram controls how dendrograms are made and can be calculatated witin the function. The internal calculation can be overridden and externally calculated dendrograms can be displayed.
dendrogram = list(Row = list(dendro = as.dendrogram(row.clus)), Col = list(dendro = as.dendrogram(col.clus))), legend = 3, # this puts the colour-scale legend on the plot. The number indicates the side on which to plot it (1 = bottom, 2 = left, 3 = top, 4 = right)
labels = list(Col = list(nrow = 12)) # gives more space for the Genus names
))

plot of chunk annHeatmap2

Annotations are added with the ann sublist which takes a data frame as its input. Unlike the other functions, it can take multiple variables for annotation including continuous variables.

# make a data frame of variables for annotation
ann.dat <- data.frame(var1 = c(rep("cat1", 4), rep("cat2", 8)), var2 = rnorm(12,  mean = 50, sd = 20))

plot(annHeatmap2(as.matrix(data.prop.1), col = colorRampPalette(c("lightyellow", "red"), space = "rgb")(51), breaks = 50, dendrogram = list(Row = list(dendro = as.dendrogram(row.clus)), Col = list(dendro = as.dendrogram(col.clus))), legend = 3, labels = list(Col = list(nrow = 12)), ann = list(Row = list(data = ann.dat))))

plot of chunk annHeatmap2.2

Finally, clusters in the dedrogram(s) can be highlighted using different colours with the cluster sublist:

# make a data frame of variables for annotation
ann.dat <- data.frame(var1 = c(rep("cat1", 4), rep("cat2", 8)), var2 = rnorm(12, mean = 50, sd = 20))

plot(annHeatmap2(as.matrix(data.prop.1),
col = colorRampPalette(c("lightyellow", "red"), space = "rgb")(51),
breaks = 50,
dendrogram = list(Row = list(dendro = as.dendrogram(row.clus)), Col = list(dendro = as.dendrogram(col.clus))),
legend = 3, 
labels = list(Col = list(nrow = 12)), 
ann = list(Row = list(data = ann.dat)),
cluster = list(Row = list(cuth = 0.25, col = brewer.pal(3, "Set2"))) # cuth gives the height at which the dedrogram should be cut to form clusters, and col specifies the colours for the clusters
))

plot of chunk annHeatmap2.3

This is just a taste of what these functions can do. Experimenting with changing the default values and seeing what happens is half the fun with R.

RedditDiggMendeleyPocketShare and Enjoy

About Jeremy Yoder

Jeremy Yoder is a postdoctoral associate in the Department of Plant Biology at the University of Minnesota. He also blogs at Denim and Tweed and Nothing in Biology Makes Sense!, and tweets under the handle @jbyoder.
This entry was posted in howto, microbiology, R, software and tagged , , . Bookmark the permalink.
  • Soraya

    Hi Author

    I am trying to set up a heat map with my own data. However my clade overlap with the row names and I do not know how to move it out.

    I would like to ask you some clues about how to proceed to fix this problem?
    Thanks in advance

  • Arianne Albert

    Hi Soraya,

    Which function are you using to make your heatmap? They all have different ways of specifying the amount of space for the various plot elements.

  • ciara keating

    Hello Author,

    My name is Ciara Keating. I’m a PhD candidate in the Microbial Ecology Lab here in the National University of Ireland, Galway. Firstly, I’d like to thank you for this guide on heatmaps, as I have been teaching myself and found your guide the best around for what I needed to do.
    There’s something I’ve been struggling with however and I am annoyed with myself as it is the simplest of things! I have managed to do everything else and really know how to do this quite well and have made some really nice heatmaps for my data. I was wondering if you could answer my question?
    I load in the data as you describe with the samples as rows and the species as columns, and the data looks how it does in your example. It’s when I go to strip the sample ID’s, the counts come up perfect, however from this point on the sample names are gone so in your example window you have the counts per species per sample whereas I get the counts per species and just the numbered rows. I’ve tried multiple ways and this keeps happening, so in the end I proceeded through the next steps as for the heatmap it just numbers my samples 1-14 instead of the names I had given them.
    If you could help, it would be greatly appreciated and also all help would be acknowledged and referenced in my publications etc.

  • ciara keating

    Dear Author,

    Firstly, I’d like to thank you for your guide on heatmaps, as I have been teaching myself and found your guide the best around for what I needed to do. There’s something I’ve been struggling with though and I am annoyed with myself as it is the simplest of things! I have managed to do everything else and really know how to do this quite well and have made some really nice heatmaps for my data.
    I load in the data as you describe with the samples as rows and the species as columns, and the data looks how it does in your example. It’s when I go to strip the sample ID’s, the counts come up perfect, however from this point on the sample names are gone so in your example window you have the counts per species per sample whereas I get the counts per species and just the numbered rows. I’ve tried multiple ways and this keeps happening, so in the end I proceeded as for the heatmap it just numbers my samples 1-14 instead of the names I had given them.

    • Arianne Albert

      Hi Ciara. Thanks for your kind words! I’m glad that my post has helped you.

      I think what’s happening is that you’ve missed a small but crucial step in the data setup. Before you strip the sample IDs, you need to set them as row names e.g.:

      row.names(all.data) <- all.data$sample

      Hope this helps! You can add row names to a data frame or matrix at any point, and you can also replace row names with the row.names command. Hope this helps!

      • ciara keating

        Hi Author,

        Yes I’m afraid I am definitely doing that but I do lose them after this point of stripping the ID’s for the matrix;

        1) row.names(all.data) <- all.data$sample

        2) all.data <- all.data[, -1]

        So that when I input this

        3) data.prop <- all.data/rowSums(all.data)

        4) data.prop[1:3, 1:3]

        I can no longer see the sample names.

        When I input row.names(all.data) <- all.data$sample nothing changes in my heat map.

        Is there any other step I could be missing or a mistake I have in the file I am importing? Thanks for all the help!

        Ciara

        • Arianne Albert

          Hi Ciara,

          It looks like some error with the row.names part. What happens if you input row.names(data.prop)? Does it give the sample names, or row numbers? You could try adding them directly to data.prop. If you add them to all.data after you’ve made data.prop then they won’t be in data.prop. Can you give me an example of the output from row.names(data.prop)?

          Thanks!

          • ciara keating

            Hi Arianne,

            Apologies for the delay for some reason there was a server error on the website so I couldn’t reply.

            Here is a copy of my output below;

            > Family dim(all.data)

            [1] 14 41

            > Family [1:3, 1:4]

            Sample desulfuromonadaceae alteromonadaceae fibrobacteraceae

            1 Seed DNA 3 1 0

            2 Seed cDNA 8 2 0

            3 R2 PMT DNA 9 4 0

            > row.names(all.data) all.data data.prop data.prop[1:3, 1:3]

            sbr1093..candidate.division. gn06..candidate.division. tm7..candidate.division.

            1 4.974382e-05 0.0001989753 4.974382e-05

            2 0.000000e+00 0.0003027734 0.000000e+00

            3 7.159773e-04 0.0001101504 5.507518e-05

            > row.names(data.prop)

            [1] “1” “2” “3” “4” “5” “6” “7” “8” “9” “10” “11” “12” “13” “14”

            How do I add them into all.data either is just trying to get that function right as it doesn’t seem to revert back to sample ID’s for me? Won’t it need to be in data.prop since that’s the function used for the heat map?

            Thank you!

          • Arianne Albert

            Hi Ciara,

            I think I spotted a typo. Your name for the sample name is ‘Sample’ with an uppercase S, whereas your command for row.names was: row.names(all.data) <- all.data$sample which uses a lowercase s. R is case-sensitive and so would not have assigned anything to your row names…

            Hope this helps!

          • ciara keating

            Hi Arianne,

            I changed it there but it was no use I’m afraid. I’m sure I can modify the names on the pdf afterwards as long as it’s calculating the heat map and the abundance correctly.

            > Family dim(all.data)

            [1] 14 40

            > Family [1:3, 1:4]

            sample desulfuromonadaceae alteromonadaceae fibrobacteraceae

            1 Seed DNA 3 1 0

            2 Seed cDNA 8 2 0

            3 R2 PMT DNA 9 4 0

            > row.names(all.data) all.data data.prop data.prop[1:3, 1:3]

            gn06..candidate.division. tm7..candidate.division. actinobacteria

            1 0.0001989852 4.974629e-05 0.03392697

            2 0.0003027734 0.000000e+00 0.02833959

            3 0.0001102293 5.511464e-05 0.02579365

            > row.names(data.prop)

            [1] “1” “2” “3” “4” “5” “6” “7” “8” “9” “10” “11” “12” “13” “14”

            > row.names(all.data) row.names(data.prop)

            [1] “1” “2” “3” “4” “5” “6” “7” “8” “9” “10” “11” “12” “13” “14”

            I attached a copy of how it looks in Excel and the csv file looks like this when opened in text edit. Maybe the mistake is with my file?

            Sample,desulfuromonadaceae,alteromonadaceae,fibrobacteraceae,burkholderiaceae,rhodocyclaceae,gracilibacteraceae,cytophagaceae,legionellaceae,hyphomicrobiaceae,xanthobacteraceae,holophagaceae,op11 (candidate division) ,flammeovirgaceae,rhodobacteraceae,marinilabiaceae,actinosynnemataceae,thermodesulfobiaceae,pasteurellaceae,caldicellulosiruptoraceae,caulobacteraceae,peptococcaceae,flexibacteraceae,dermatophilaceae,bradyrhizobiaceae,ignavibacteriaceae,gn06 (candidate division) ,gn02 (candidate

          • ciara keating

            Hi Arianne,

            Actually I have the percentage abundance calculated in Excel already so maybe I could skip some of the steps and then just do the Bray resemblance and then make the heat maps? Without stripping the ID’s?

          • Arianne Albert

            You could do that, but for the Bray-Curtis dissimilarity you’d need to have only the proportions in the matrix. I’m still not sure why you’re row name assignment isn’t working. Does it give you any kind of error message? One last try would be to set the row names when you import the csv file: read.csv(“/Users/ciarakeating2/Documents/Family.csv”, sep=”,”, row.names = 1) # if your sample names are in the first column.

          • ciara keating

            Hi Arianne,

            I tired that there but to no avail! When I input this <Family [1:3, 1:4] after making the heat map the sample ID's are there so I really don't know what's going on with it. Is there a function within the heat map function to tell it what the row names are? I tried what you've previously said again and they still came up as numbers..

          • ciara keating

            Hi Arianne,

            I also tried this for a simple heat map and the sample ID’s still didn’t come up I will work with the Excel file and sample names as maybe this is the problem?

            row.names(Family) <- Family$Name
            Family <- Family[,2:41]
            Family_matrix <- data.matrix(Family)
            Family_heatmap <- heatmap(Family_matrix, Rowv=NA, Colv=NA, col = cm.colors(256), scale="column", margins=c(5,10))

          • Arianne Albert

            In both the heatmap and heatmap.2 functions you can specify labRow (or labCol if you’ve got your samples as columns) as a vector of names, the default is to use row names, but that’s clearly not working in your case.

          • ciara keating

            I’m not sure why but I cleared the R console and remade the Family file… and yey it is working thank you so much for your patience! And this article is brilliant! :)

  • Vijay

    Hi Author,

    This is an awesome article. I was breaking my heads to get this heatmap and this has helped me out big time.

    But I am facing an issue. I am encountering an error which says function heatmap.2 not found. I have installed the gplots package. Could you please advise me on this.

    • Arianne Albert

      Hi Vijay,

      Is it possible that you’ve forgotten to load the package? Try library(gplots) and see if it runs then.

      • Vijay

        Thanks for your response. I have got it now!! But I am seeing that there are negative values in the colour scale!! Is there a way to fix this??

        • Arianne Albert

          In heatmap.2 you can specify symkey. The description says: “Boolean indicating whether the color key should be made symmetric about 0. Defaults to TRUE if the data includes negative values, and to FALSE otherwise.”

          Hope this helps!

  • Nadine

    Which version of are were you using? I´m struggeling to find the right version in which all the packages are running…The ones I tried until now are 3.01, 2.15.3 and 2.15.0. Unfortunately I can´t find any information concerning the R version needed. Thanks a lot for any help in advance.

    • Arianne Albert

      Hi Nadine,

      When I wrote the post I was using R 3.0.1. I’ve since upgraded to 3.0.2 and, as far as I know, these all still work with that version. Which package are you having a hard time with in particular?

      • Nadine

        Hey Arianne, thanks a lot for your fast reply. Actually, heatmap.2, so the package gplots was not running under 3.0.1, (not then I installed 2.15.3, where the heatmap.2 command was working perfectly, but thereafter the package Heatplus wouldn´t work anymore. I really have no idea, why the different R versions keep telling me that one or the other command is not available (for R version 3.0.1). Besides that your article was perfect for what I wanted to plot, I was really glad I found it. Now I would just like to get Heatplus running under one of my versions. Maybe you have an idea? Thanks again for your effort. Best Nadine

        • Arianne Albert

          I’m really baffled as to why it’s not working for you. I just tested it again, and it all worked fine for me in R 3.0.2, and I updated all of my packages and biocLite and it still works. Perhaps you need to update Bioconductor? Can you give me an example of the error message you’re getting?

  • Cara

    Hello,

    If it is not too late to ask a question here, I’d like to see if I could get some help on this. This blog is a fantastics guide, but I have been getting quite few errors – most were graphics errors that went a way when I reinstalled Heatplus from bioconductor (I say this incase anyone else had these problems), but now I am getting an error that says:Error: is.list(val) is not TRUE

    So I must have an issue with how list is used I guess? Here is my code, I did not add in the ann function.

    plot(annHeatmap2(as.matrix(ext), col=colorRampPalette(brewer.pal(9,”Blues”))(51), 50, dendrogram=list(Row=list(dendro=as.dendrogram(row.clus)), Col=list(dendro=as.dendrogram(col.clus))), legend=3, labels=list(Col=list(nrow=10)), cluster=list(Row=list(cuth=0.25, col=brewer.pal(3, “Set2″)))))

    Any suggestions are much appreciated, I have been looking up error messages for days! Thank you!

    Oh and if I can ask one more thing that I have been wondering – is there anyway to get all the labels on the same side as the dendrogram (for both column and rows)? And is it possible to rotate the dendrogram?

    • http://www.denimandtweed.com/ Jeremy Yoder

      Approve—
      Sent from Mailbox for iPhone

    • Arianne Albert

      I found your error Cara. You need to specify breaks = 50 and not just 50 or it doesn’t understand what you want. It is a function that doesn’t use the position of the arguments to know what values they should be.

      I’m not sure about putting the labels on the same side. annHeatmap2 is a little hard to work with. I think it should be possible in heatmap.2.

      You can rotate the dendrograms by swapping which one is for rows and which one is for columns (at least this works for me). You have to transpose the matrix of data in the heatmap command though.

      • Cara

        Thank you Arianne, I don’t know how I missed that! I guess I just got so wrapped up in the error messages. I got the colors on the dendrogram working the way I want and the legend is a long the top and looks good. That is a good point about just swapping rows and columns, I will play around with that.
        I wanted to follow up with one more question though if you don’t mind? I am unclear about the color legend for the heatmap. In heatmap.2 it uses values from the matrix, but then in annHeatmap2 it goes form 0-4. Does this directly relate to the values in the table or is it purely relative?
        Thank you for taking the time to respond to my questions.

  • Rodrigo De la Iglesia

    Dear Arianne: Thanks a lot for your post. I don’t know if you still look at the comments, but you had make my work more easy.
    However, I had a question didn’t found the answer at least in the Heatplus documentation. How do the package calculates the scale for the heatmap colours?? I mean, the numbers in the scale are difference in “abundance”? times of abundance?
    For example, in your last figure, Lactococcus in S16 is 4 times more abundant than in let’s say S13?
    I do understand that, unless you change it, the function standardize by row to do the heatmap, but not sure about the scale and is breaking my mind.
    Hope you can help me with this.
    All the best

    • Arianne Albert

      Hi Rodrigo. In Heatplus the colour scale for the legend is in log base 10. So a value of 4 = log(4, 10) = 0.602. If you look at the other heatmap functions you’ll see that it’s equivalent. The different intensities of the colours refer to the relative abundance within a sample and not across samples. In hindsight, I probably could have chosen a more informative colour scale for these data.
      Hope this helps!

  • kevin

    Dear Author:
    where is the “genera example.csv” in the command
    all.data <- read.csv("genera example.csv")?

    • Arianne Albert

      Hi Kevin. You can download the data from Dryad (there’s a link) and then it needs some cleaning before you can use it (as described in the post). I named mine ‘genera example.csv’, but you could name yours anything you like.

  • 3jay

    Hi Ariane

    Thank you for this wonderful blog. I am following your blog–step by step using my own dataset on genus counts. My 9 x 575 matrix (i.e 9 time points and 575 genera). The first couple of step went great. However very early on, when I get to transforming the raw counts of reads to proportions within sample–I am receiving this error:

    > data.prop <- all.data/rowSums(all.data)

    Error in rowSums(all.data) :

    'x' must be an array of at least two dimensions

    I tried googling this error, but this only compounded my confusion (I am new to R and trying to learn), If you can help me, I would be exceptionally grateful
    Thank you,
    Respectfully
    vijay (vijayantharam@@gmail.com)

    • Arianne Albert

      Hi Vijay. It’s pretty hard to answer your question without knowing what your data frame looks like. The error message suggests that there is something wrong with ‘all.data’. For general R questions, you might find Stack Overflow (http://stackoverflow.com/) helpful.

      • 3jay

        Hi Arianne
        Thank you so much for your response. I was able to get things to work but now I have ran into another problem. My data is just like yours except I have time points (day 0 , day 5, day 10 ,etc, etc) for each row. When I use “heatmap” everything looks great, but when I use “heatmap.2″ to get a legend, it starts putting a row dendrogram which I do not want. I just want a column dendrogram. I dont think it would make sense to cluster time points of collection. I never specified anything to cluster the rows–but heatmap.2 insists on putting a row dendrogram there along with the column dendrogram (which I want). As a consequence, my time series is day 5, day 0, day 14, day 84, day 24—pretty much all of out order.

        Is there anyway you can help me? Thank you so much
        Respectfully
        Vijay Antharam
        vijayantharam@gmail.com

        • Arianne Albert

          Did you try Rowv = NULL?

          • 3jay

            Hi Arianne

            Thank you for reply, I tried doing that and I am still getting an error message–please see below

            > heatmap.2(as.matrix(data.prop.1), Rowv = NULL, Colv = as.dendrogram(col.clus), col = scaleyellowred, margins = c(11,5), trace = “none”, density.info = “none”, xlab = “Family”, ylab = “Time Point”, main = “Subject 101″, lhei = c(1,7))

            Error in image.default(1:nc, 1:nr, x, xlim = 0.5 + c(0, nc), ylim = 0.5 + :

            dimensions of z are not length(x)(-1) times length(y)(-1)

            In addition: Warning message:

            In heatmap.2(as.matrix(data.prop.1), Rowv = NULL, Colv = as.dendrogram(col.clus), :

            Discrepancy: Rowv is FALSE, while dendrogram is `both’. Omitting row dendogram.

            I guess, as a beginner, I am having trouble with heatmap.2 functionality, as it is constantly clustering the rows which I want to suppress. Do you know a way around this? . Is there a way to generate a legend using the heatmap function? Thank you so much for your help

            Vijay
            vijayantharam@gmail.com

          • Arianne Albert

            for the warning message, you can add dendrogram=”col” to the call to tell it you only want the column dendrogram. It seems redundant, but works. For the error message, try: either Rowv = FALSE (instead of NULL), or Rowv=1:nrow(data.prop.1)

          • 3jay

            Hi Arianne
            I got it to work, it was not easy, but you helped me along the way. Now the time points (rows) are all in order and the genera (columns) are clustered with a nice dendrogram shown.

            What can I say? Other than thank you so very much–for the last two days over several late night hours, I was studying this page immensely.

            I need to get used to the margins and read up on them in R though, it is throwing me an error when I try to adjust anything— I cannot really change the dimensions of my heat map legend without it throwing an error, or changing fonts—I am studying and trying to learn that.

            But for now, I have a heatmap and it looks very nice thanks to your kind and wonderful help!

            respectfully
            Vijay Antharam
            vijayantharam@gmail.com

          • 3jay

            Hi Arianne

            I have now run into another error as I was trying to produce 25 heatmaps using the code you helped me with 4 months ago.

            I have followed the code and procedure exactly as before but now I am getting this error:

            Error in heatmap.2(as.matrix(mat_data), Rowv = FALSE, Colv = as.dendrogram(col.clus), :

            Colv dendrogram doesn’t match size of x

            I am not sure what is happening, I do not want to cluster the rows which contain my time points, I got this work beautifully before, but now after a 4 month hiatus, I am running into some more problems

            thank you so much

            respectfully

            Vijay

            vijayantharam@gmail.com

          • Arianne Albert

            Hi Vijay. I’m not sure why this isn’t working for you. From the error message it seems like maybe col.clus was made on a different matrix than mat.dat. Check your code carefully for that. Otherwise I can also suggest that you change Rowv = NULL from FALSE, but I don’t think that will solve your current problem.

  • Christoffer Bugge Harder

    If I may ask a question hopelessly late to the party: When I try to shamelessly copy your excellent trick of removing genera whose maximal abundance is less than 1%, I end up with all genera removed (no columns). What am I doing wrong? Here is the code from my R console – everything appears to be working fine until I try the

    data.prop.1 OTU_table_ss_0_sort_prop OTU_table_ss_0_sort_prop[1:3, 1:3]
    Bacteroides_vulgatus Prevotella_copri Bacteroides_sp.
    Den_01 0.0001473672 4.121975e-01 0.48542765
    Den_02 0.3892535283 1.020235e-04 0.07094032
    Den_03 0.2277957264 1.133594e-05 0.06929660 #FINE
    > scaleyellowred heatmap(as.matrix(OTU_table_ss_0_sort_prop), Rowv=NA, Colv=NA, col=scaleyellowred)
    #FINE

    > OTU_table_ss_0_sort_prop OTU_table_ss_0_sort_prop[1:3, 1:3]
    Bacteroides_vulgatus Prevotella_copri Bacteroides_sp.
    Den_01 0.0001473672 4.121975e-01 0.48542765
    Den_02 0.3892535283 1.020235e-04 0.07094032
    Den_03 0.2277957264 1.133594e-05 0.06929660

    #FINE
    > # determine the maximum relative abundance for each column
    > maxab head(maxab)
    Bacteroides_vulgatus Prevotella_copri Bacteroides_sp. Bacteroides_uniformis
    0.3892535 0.5760925 0.4937029 0.2728334
    Subdoligranulum_variabile Escherichia_coli
    0.1603696 0.6235334
    > n1 <- names(which(maxab head(n1)
    [1] “Clostridium_sp.” “Clostridium_citroniae” “Bifidobacterium_ruminantium” “Moryella_indoligenes”
    [5] “Ruminococcus_flavefaciens” “Eubacterium_ventriosum”

    #FINE UNTIL HERE!
    > OTU_table_ss_0_sort_prop_removed head(OTU_table_ss_0_sort_prop_removed)

    Den_01
    Den_02
    Den_03
    Den_04
    Den_05
    Den_06

    #WTF…………..

    • Cara Fiore

      Hello,

      I have come back to this blog because you have been so helpful before. I was wondering if you know how to make the plot a little larger – the columns and column names seem so scrunched and I’d like to spread it out (so looking at the last figure on this blog I’d like to expand it left/right (not up/down)? I tried margins=c() but perhaps I don’t have it in the right place because I got an error? Any help is greatly appreciated!

      Cara

      • Arianne Albert

        For which heatmap command? annheatmap2 won’t understand the margins command I’m afraid. try looking up the help for heatmapLayout, or try making your plot area wider.

    • Laure

      I put colnames instead of names and it worked.
      mat_data_01 <- mat_data[ , -which(colnames(mat_data) %in% n1)] # remove the genera with less than 1% as their maximum relative abundance

    • Arianne Albert

      Hi Christoffer. Try Laure’s trick below. the name function doesn’t always work, but the colnames function seems more reliable. Let me know if that doesn’t work and we’ll try some more troubleshooting.

  • Jimmy

    Hi Arianne,
    Thanks for this post, the resource has been really helpful.

    From what I understand, the actual heatmap in your example is made from relative abundance, but the clustering of samples and taxa is based on dissimilarity. Is there anyway to calculate the heatmap on dissimilarity (using the data.dist matrix)?

    • Arianne Albert

      You know, I actually have no idea. Let me know if your experiment works…

  • Susheel Busi

    Hi Arianne,

    Firstly, thank you very much for this thorough and very helpful post for people like me who are not accustomed to R. Secondly, I have a question with regards to grouping the samples into say, “Wild-type” and “Treatment” groups. In your example, you state a random sampling out of 12 samples, where min=1 and max=2. What are these min. and max values and how are they established? Also, is there any way I can specifically specify that certain rows be grouped together. For example, if I wanted row 1-6 as ‘wild-type’ and row 7-12 be grouped as ‘treatment’, how would I write the code for the same?

    My apologies if I’m asking you to do my work for me, but it’s a little daunting trying to figure it out from all the tutorials available online.

    Thank you again for your kind help.

    Sincerely,

    Susheel