Procrustes Analyses in R

Procrustes transformations (i.e. a form of multidimensional scaling that allows the comparison of two data sets) have been used extensively in recent literature to assess the similarity of geographical and genetic distributions of species, following the lead of Wang et al. (2010). See Jeremy’s post describing the method and its application to genomic data. I’ve scoured the internet but can’t seem to find a way to make these plots using R (or any software for that matter). Here’s a simple tutorial on how to do this in R using principal components (PC1, and PC2) already computed from a PCA on SNP data (using your favorite tool – eg. EIGENSOFT, or even in R, and geographical coordinates for each individual. This tutorial uses the package “MCMCpack” to compute Procrustes transformations. As an example, I am using a chunk of the European data-set published in Novembre et al. (2008), which I downloaded from here. Note that this file already contains geographical information (latitude and longitude), and PC’s 1 and 2 computed using EIGENSOFT.

nov

Procrustes analyses of genetic and geographic coordinates in Europe, sensu Wang et al. (2010).

 install.packages(“MCMCpack”)
library(MCMCpack)
library(maps)
nov<-read.table("PCA.txt",header=TRUE,sep="\t")
X<-as.matrix(cbind(nov$longitude,nov$latitude))
Xstar<-as.matrix(cbind(nov$PC1,nov$PC2))
p<-procrustes(Xstar,X,translation=TRUE,dilation=TRUE)
map(database="world",regions=c('belgium','netherlands',
'austria','denmark','portugal','italy','spain','uk','germany',
'france','sweden','norway','finland','luxemburg',
'greece','monaco','ireland'),xlim=c(-10,20),ylim=c(35,60))
map.axes()
text(p$X.new,col=c(nov$alabels),labels=nov$alabels,cex=0.45)
text(nov$longitude,nov$latitude,col=c(nov$alabels),
labels=nov$alabels)

And voila! My best reproduction of Figure 1 of Wang et al. (2010) – please note that I only used a portion of the data-set in this tutorial. Feel free to play around with other data-sets, and let me know how it goes!

Share

About Arun Sethuraman

I am a computational biologist, and I build statistical models and tools for population genetics. I am particularly interested in studying the dynamics of structured populations, genetic admixture, and ancestral demography.
This entry was posted in genomics, howto, population genetics, R, software and tagged , , , . Bookmark the permalink.