Admixture maps in R for Dummies

Before we get started, I’d like to point everyone to an excellent tutorial here by Kim Gilbert on making maps in R. I have been grappling with overlaying admixture plots, and migration routes on top of maps recently, and thought I’d put together a quick tutorial on that. What you will need:

  1. GPS coordinates
    Eg. gps.data
    ID,Lat,Lon
    1,33.72,-94.40
    2,34.1682185,-111.930907
    3,37.2718745,-119.2704153
    

  2. Admixture proportions – these are routinely printed out as the “Q” matrix by STRUCTURE, and in a file suffixed with a “Q” by ADMIXTURE. I added a column with sample sizes to this, to proportionately resize the admixture pies.
    Eg. admixprops.data
    K1,K2,Num
    0.836601,0.163399,12
    0.565904,0.434096,16
    0.508735,0.491265,18
    0.111114,0.888886,9
    

  3. maps() and plotrix() packages in R

#Initialize packages
library(maps)
library(plotrix)
gps<-read.csv(“gps.data”,header=TRUE) #Read input files
admix<-read.csv(“admixprops.data”,header=TRUE)
map(”state”) #Plot maps
map.axes() #Add axes
points(gps$Lon, gps$Lat,
cex = admix$Num/5, col=’red’, pch=19) #To add just points
for (x in 2:nrow(gps)) { #To add arrows for migration models
arrows(gps$Lon[1],gps$Lat[1],gps$Lon[x],
gps$Lat[x],lty=”dashed”,code=2)
} #To add admixture plots – here I used K = 2.
for (x in 1:nrow(gps)) { floating.pie(gps$Lon[x],gps$Lat[x], #You can modify your loop to reflect this
c(admix$K1[x],admix$K2[x]),radius=admix$Num[x]/8,
col=c("red","blue") }

usa-tut1 usa-tut2

And voila! What took me four lines of code in R, involved days of learning ArcGIS to do the same thing (not to undermine the way cooler things that you can do with ArcGIS). I’ve been playing around with adding a phylogenetic tree to the admixture map as well using the ape package – more to come on this soon!

Share

About Arun Sethuraman

I am a computational biologist, and I build statistical models and tools for population genetics. I am particularly interested in studying the dynamics of structured populations, genetic admixture, and ancestral demography.
This entry was posted in howto, population genetics, R, software, STRUCTURE and tagged , . Bookmark the permalink.
  • Pingback: Stuff online, pointing out the problem edition | Denim & Tweed()

  • Pingback: Geophylogeny plots in R for Dummies | The Molecular Ecologist()

  • Vivek Vaishnav

    Dear Arun, I am thankful to you that you have posted such information for dummies also. Please do me a favor. I tried to follow you many times but each time after very first command to load the gps.data file, following message is being shown:-

    Error: unexpected input in “gps<-read.csv(“"

    I prepared my data with comma-delimited text format (As you given the example) but not getting the reason causing error.
    Help me please.

    • Vicki Villanova

      Hi Vivek! I had the same problem. For some reason the style of quotations marks messes up the code in R. I deleted the quotation marks in my script and typed them back in. If you’re using R studio you’ll immediately see that the file name changes to a green color in the script.

      • Vivek Vaishnav

        Hi Vicki! Ya I tried the same but still facing the problem. I am free to share my input txt files. It all are in the same format given above.
        I could not resolve the problem yet.
        Hey Vicki, thank you for suggesting me!

        • Vicki Villanova

          Vivek, I have my data working (mostly), aside from the migration routes, which I personally don’t need anyways. I can look at it if you want. My e-mail is vvillanova@knights.ucf.edu

    • Hi Vivek and Vicki,

      Thanks for figuring this out – I didn’t realize that WordPress changed the quotes when I pasted my R code.
      I’ll try and look this over the next time around!

      Best,
      Arun

  • Vicki Villanova

    Hi Arun,
    I would also like to thank you for this script! However, I too am running into a glitch when using your script. I get the following error, even when using your data set:
    Error: could not find function “nrows”
    Any help would be greatly appreciated!

    • Oops – I must have typed this wrong. Try “nrow” instead, and not “nrows”.
      I’ve changed this in the post too.

  • Rena

    Hi Arun,

    Thank you for posting such useful code. I modified your code only slightly to plot assignment at K=6 for each individual. I noticed something wrong with the colors when I compared the R-generated plot to a previous handmade plot I did. By playing around with some of the numbers for a specific individual that I knew was wrongly colored (e.g. it should have been a green color corresponding to one population but was instead being colored red), I noticed that whether or not you have assignment proportions equal to 0 affects how R colors the pie chart. If a population assignment value equals 0, R skips over that value (since there’s nothing to plot) but does not skip over the corresponding color in a 1:1 manner. So, the color ends up being wrong. For example, if I have two rows:

    SampleName K1 K2 K3 K4 K5 K6
    Sample_A 0.00100 0.05990 0.03710 0.00150 0.00000 0.9005
    Sample_B 0.0001 0.0001 0.0001 0.0001 0.0001 0.9995

    and colors:
    for (x in 1:nrow(gps)) {floating.pie(gps$X.Longitude[x],
    + gps$Y.Latitude[x], c(admix$K1[x],admix$K2[x],admix$K3[x],admix$K4[x],
    + admix$K5[x],admix$K6[x]),
    + col=c(“orange”,”mediumblue”,”gold”,”skyblue2″,”green4″,”red”))}

    R will color Sample_B correctly (red, the 6th color), but Sample_A will be the wrong color (green, the 5th color, because it skipped over the value of 0 in K5 column). One can easily fix this coloring problem by changing proportions of 0 to 0.000001 so that the coloring of each population is maintained in a 1:1 correspondence.

    I hope that’s clear. I thought this information might be useful for anyone else who wants to modify your code to plot individual assignments.

    Best,
    Rena

    • Hi Rena,

      Thanks so much for pointing this out, and the code snippets! Appreciate it! 🙂

      Best,
      Arun

    • Magni

      Hi Rena!

      Thanks for adding more information (and thanks to arun for providing the script). I’m running the commands and everything seems to work, except I do not get my pie.charts om the map. Not using my own dataset or Arun’s data. Any suggestions to why that is?

      Best,
      Magni

    • Magni

      Nevermind, problem solved. Helps to have the right ID at the top of the columns…

      Magni

  • Pingback: A few good molecular ecologists: six months and 116 posts later | The Molecular Ecologist()