# PCA of multilocus genotypes in R

An earlier post from Mark Christie showed up on my feed on calculating allele frequencies from genotypic data in R, and I wanted to put together a quick tutorial on making PCA (Principal Components Analysis) plots using genotypes. I used the genotype data published by Tishkoff et al. (2009) for this example, but it should work for any generic genotype format, as long as it’s stored as a table/matrix. As an example, let’s try plotting the first three principal components for the Yoruba, Hadza, Han, and Tamil populations.

```install.packages(“scatterplot3d”)
install.packages(“rgl”)
library(scatterplot3d)
library(rgl)
tishkoff.pca<-prcomp(tishkoff[,4:1000],
center=TRUE,scale.=TRUE)
pcs123<-tishkoff.pca\$x[,1:3]
yoruba<-grep(“Yoruba”,tishkoff\$population.name)
han<-grep(“Han”,tishkoff\$population.name)
tamil<-grep(“Tamil”,tishkoff\$population.name)
p<-scatterplot3d(pcs123[yoruba,1],pcs123[yoruba,2]
,pcs123[yoruba,3],color=c(“red”),
pch=20,xlab=“PC1”,ylab=“PC2”,zlab="PC3",
xlim=c(min(pcs123[,1]),max(pcs123[,1])),
ylim=c(min(pcs123[,2]),max(pcs123[,2])),
zlim=c(min(pcs123[,3]),max(pcs123[,3])))
p\$points3d(pcs123[han,1],pcs123[han,2],
pcs123[han,3],col=c(“blue”),pch=20)