Recent Ancestry of the USA and the 100k Genome Project

Holiday presents for pop-gen enthusiasts come in the form of data – boatloads of it! The past two weeks saw the announcements of two neat studies that spell monumental steps toward our understanding of the genetics of mixed populations.

With a relatively recent migratory history, much of North America has been a mixture of peoples. While a lot of the ancestry analysis of North America has been anecdotal, a large scale study of the genetic make-up of the USA has yet to be conducted. In a recent study, Bryc at al., as a culmination of large scale genotyping from stocking-stuffers by 23andMe, fill in some of these blanks.

Mean European/Native American/Latino ancestry among 23andMe customers across North America. Image courtesy: http://www.cell.com/ajhg/ppt/S0002-9297(14)00476-5.ppt

Important conclusions from the study include a) greater variation in African ancestry among self-identified African-Americans, primarily Iberian ancestry among self-identified Latino-Americans, and localized (by state) variation in European ancestry across the USA, b) sex bias in ancestral composition, indicative of social contributors to genomic admixture, and c) larger correlation between self-identified ancestry and genomic ancestry than detected by previous studies.

The pipeline utilized in the study (termed “Ancestral Composition”) has been detailed in another study by Durand et al. In brief, the steps involved are (1) phasing high-density SNP chip genotype data, (2) identifying IBD (Identical By Descent, here used to represent phased genomic regions, with most SNP’s in the region being directly derived from the common ancestor) tracts, (3) assigning local ancestry to these IBD tracts using an SVM-based classifier.

Perhaps most importantly, however, our results reveal the impact of centuries of admixture in the US, thereby undermining the use of cultural labels that group individuals into discrete non-overlapping bins in biomedical contexts “which cannot be adequately represented by arbitrary ‘race/color’ categories.”

In other news, the NHS just announced plans to sequence 100,000 human genomes to quantify the dynamics of 110 hereditary disorders, including leukemia, breast, bowel, ovarian, and lung cancers. More data! 2015 definitely has a very promising outlook towards the applications of genomics in personalized medicine.

References:

Bryc, Katarzyna, et al. “The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States.” The American Journal of Human Genetics (2014). http://dx.doi.org/10.1016/j.ajhg.2014.11.010

Durand, Eric Y., et al. “Ancestry Composition: A Novel, Efficient Pipeline for Ancestry Deconvolution.” bioRxiv (2014): 010512. http://dx.doi.org/10.1101/010512

Share

About Arun Sethuraman

I am a computational biologist, and I build statistical models and tools for population genetics. I am particularly interested in studying the dynamics of structured populations, genetic admixture, and ancestral demography.
This entry was posted in genomics, population genetics and tagged , , . Bookmark the permalink.