Migration Circos plots in R

We’ve all seen them – colorful, and I daresay, pretty darn informative. Circos plots are fun visualizations of large data-sets. I’ve seen them used in two contexts in comparative genomics – to represent structural variants in homologous chromosome segments in species alignments, and to perhaps represent gene-gene interactions. But there are scores of other interesting applications of these plots to scientific data, a comprehensive list (and yet growing) can be seen here.

circos

Circos plot of source-sink migration dynamics between populations “3” and “4” here, with 8 other populations. Width of migration curves indicates amount of migration.

A relatively recent publication by Abel and Sander (2014) in Science on using Circos plots to represent migration prompted me to explore the migest package in R. Scores of studies in molecular ecology and population genetics utilize methods to estimate ancestral or contemporary migration routes between populations, such as MIGRATE-N, IM/IMa2, IMMANC, BayesAss, etc. I am yet to see a migration visualization that comprehensively describes complex migration routes between populations in pop-gen studies. So putting these two together, I thought I’d use migration estimates from one of the above tools, and represent it as a Circos plot. I am going to skip a few steps in creating the data files required to make these plots, but I refer you to some excellent documentation by Guy Abel.

You will need:

File eg. df1.txt – Contains information about the location, RGB colors picked out for each region, and order they should appear in the circle.

order rgb region
1 255,0,0 1
2 210,150,12 2
3 125,175,0 3
4 117,0,255 4
5 160,0,125 5
6 29,100,255 6
7 0,255,233 7
8 73,255,100 8
9 100,146,125 9
10 255,219,0 10

File eg. df2.txt – Contains migration estimates (here I have them scaled to average number of individuals per generation). I used MIGRATE-n for these estimates.

orig dest m
4 1 36.67
3 2 491.33
4 2 1970
4 3 14
3 4 16.67
3 5 835.33
4 5 106
3 6 252.67
4 6 555.33
3 7 558
4 7 1296.67
3 8 122
4 8 392.67
3 9 391.33
4 9 1967.33
3 10 39.33
4 10 920.67

Okay! Now onto reading data, and getting started with plotting.

library(“migest”)
library("plyr")
df1<-read.table("df1.txt",header=TRUE,stringsAsFactors=FALSE,skip=1,sep="")
names(df1)<-c("order","rgb","region")
df2<-read.table(“df2.txt”,header=TRUE)

Now that our files are read, let’s do some housekeeping to the tables.

df1$xmin<-0
df1$xmax<-sum(df2$m)
n<-nrow(df1)
df1<-cbind(df1, matrix(as.numeric(unlist(strsplit(df1$rgb,","))),nrow=n,byrow=TRUE))
names(df1)[ncol(df1)-2:0]<-c("r","g","b")
df1$rcol<-rgb(df1$r,df1$g,df1$b,max=255)
df1$rcol<-rgb(df1$r,df1$g,df1$b,alpha=200,max=255)

We’re now ready to plot! First up – we create the segments around the circle with parameters using the “df1” table. You shouldn’t have to change any of this code.

library("circlize")
par(mar=rep(0,4))
circos.clear()
circos.par(cell.padding=c(0,0,0,0),
track.margin=c(0,0.15),start.degree=90,gap.degree=4)
circos.initialize(factors=df1$region,
xlim=cbind(df1$xmin, df1$xmax))
circos.trackPlotRegion(ylim = c(0, 1),
factors = df1$region, track.height=0.1,
panel.fun = function(x, y) {
name = get.cell.meta.data("sector.index")
i = get.cell.meta.data("sector.numeric.index")
xlim = get.cell.meta.data("xlim")
ylim = get.cell.meta.data("ylim")
theta = circlize(mean(xlim), 1.3)[1, 1] %% 360
dd <- ifelse(theta < 90 || theta > 270,
"clockwise", "reverse.clockwise")
aa = c(1, 0.5)
if(theta < 90 || theta > 270)  aa =c(0, 0.5)
circos.text(x=mean(xlim), y=1.7,
labels=name, facing = dd, cex=0.6,  adj = aa)
circos.rect(xleft=xlim[1], ybottom=ylim[1],
xright=xlim[2], ytop=ylim[2],
col = df1$rcol[i], border=df1$rcol[i])
circos.rect(xleft=xlim[1], ybottom=ylim[1],
xright=xlim[2]-rowSums(m)[i], ytop=ylim[1]+0.3,
col = "white", border = "white")
circos.rect(xleft=xlim[1], ybottom=0.3,
xright=xlim[2], ytop=0.32, col = "white",
border = "white")
circos.axis(labels.cex=0.6, direction =
"outside", major.at=seq(from=0,to=floor(df1$xmax)[i],by=5),
minor.ticks=1,
labels.away.percentage = 0.15)})

And we have segments! Now all that’s left to do is plot the migration routes between segments (here populations). Again, you shouldn’t have to change any of this code.

df1$sum1 <- colSums(df2$m)
df1$sum2 <- numeric(n)
for(k in 1:nrow(df2)){
i<-match(df2$orig[k],df1$region)
j<-match(df2$dest[k],df1$region)
circos.link(sector.index1=df1$region[i], point1=c(df1$sum1[i],
df1$sum1[i] + abs(df2$m[k]/1000)),
sector.index2=df1$region[j], point2=c(df1$sum2[j],
df1$sum2[j] + abs(df2$m[k]/1000)), col = df1$lcol[i])
df1$sum1[i] = df1$sum1[i] + abs(df2$m[k]/1000)
df1$sum2[j] = df1$sum2[j] + abs(df2$m[k]/1000)}

And behold! I am just in awe of how pretty this plot is – my purple population (4) obviously has a lot more migration than the green one (3). Sort of tells my whole story in one neat figure! As I said before, this is just a bare minimum example. There are plenty of options that can be tweaked to suit your own figure, but it’s still a start. R on!

Share

About Arun Sethuraman

I am a computational biologist, and I build statistical models and tools for population genetics. I am particularly interested in studying the dynamics of structured populations, genetic admixture, and ancestral demography.

This entry was posted in bioinformatics, genomics, howto, R, software and tagged , . Bookmark the permalink.
  • Natalia Bayona

    Hi Arun, great post!

    I’m trying to follow the code, with the example data and my own data. But when df1$xmax is defined

    > df1$xmax 180) { :

    absent value TRUE/FALSE is neccesary.

    I know I defined theta before:

    theta = circlize(mean(xlim), 1.3)[1, 1] %% 360

    16dd <- ifelse(theta 270,

    17″clockwise”, “reverse.clockwise”)

    18aa = c(1, 0.5)

    19if(theta 270) aa =c(0, 0.5)

    So, I am not completely sure, what is happening with my data. I already review Nikola and Guy guide, but they don’t explain carefulle the plyr package steps to obtain the full table. Anyone else is having the same problem with the example data? I will appreciate any feed back.

    Cheers!

    Natalia Bayona

    • Arun Sethuraman

      Hi Natalia,

      Good catch with the first error – since the “df2” file that I specified only has one column of migration rates, it’s sufficient to set df1$xmax to the sum(df2$m). I’ve changed it now. I’m not sure what’s going on in your second error – my guess is that you’ve pasted it with the line numbers? To avoid this, click on the “view source” icon on the top right corner of each code snippet, and then copy-paste into your R window. Let me know if you still have the same error.

      Thanks!

      • Natalia Bayona

        Hi Arun,
        I never figured out what happened. But when I fixed df2, everything works fine. Thanks!

  • Brittany Rife

    This is exactly what I have been looking for! However, only 1 of my df1 pre-specified regions shows up in the plot before I even get to plotting the migrations and then R freezes. Do you happen to know if this a compatibility issues with the newer R versions?

  • Prashant Dhingra

    I get this error – > circos.trackPlotRegion(ylim = c(0, 1),factors = df1$region, track.height=0.1,panel.fun = function(x, y) {

    + name = get.cell …. [TRUNCATED]

    Note: 1 point is out of plotting region in sector ‘2’, track ‘1’.

    Error in is.data.frame(x) : object ‘m’ not found

    • Arun Sethuraman

      Hmm. Not entirely sure what this error means – could you please email me more details of what you tried? arun@temple.edu

      • Buceo Universitario Michoacano

        Hi Prashant Dhingra. I have the same error, can you resolve it? please can you tell me how? Thanks in advance.

        • Arun Sethuraman

          Hi Buceo,
          I am yet to resolve this issue – I’ll get back to you soon with a solution. Thanks for your patience!
          Arun

  • Buceo Universitario Michoacano

    Hi

    I’m trying to follow the code with the example data and I get an error, when I do the “We’re now ready to plot! First up – we create the segments around the
    circle with parameters using the “df1″ table. You shouldn’t have to
    change any of this code.”

    The message is:

    Note: 1 point is out of plotting region in sector ‘2’, track ‘1’.
    Error in is.data.frame(x) : object ‘m’ not found

    It seems the same message that Prashant Dhingra two months ago. Can you help me?

    • James Austin

      I got the same error.