What do with all those pesky mtDNA reads in your NGS experiment

Have you ever noticed how many reads from your high throughput sequencing project map to the tiny fraction of your genome that is the mitochondrial genome (mtDNA)? Pretty much any NGS experiment (e.g., RNA-seq, DNA-seq, capture-based sequencing) leave you with ultra-deep coverage of mtDNA. But what do you do with them? The most common option is to ignore reads mapping to mtDNA. An even less common option is to turn them into a Science paper . But what if you want to do something with those reads and not publish it in Science?

One of the biggest challenges of working with sequencing data that come from mtDNA is in calling genotypes. This is because for every single copy of the nuclear genome a cell can have tens to thousands of copies of the mitochondrial genome. And among these many copies you can also have multiple variants at any one locus (i.e., mtDNA is polyploid). Compounding this fact is that, at any one locus, there is usually one variant that is present in the majority of the copies of the mitochondrial genome. These two attributes, polyploidy and highly skewed within-sample allele frequencies, make it difficult for most variant calling algorithms to accurately measure heteroplasmy in NGS experiments. That is, until now.

So, readers of The Molecular Ecologist, do I have the program for you (or for an undergrad or grad student in need of a project in your lab). Jun Ding and colleagues recently published a new method and the accompanying software to measure mitochondrial DNA variation within samples (as a bonus, you get to also measure mtDNA copy number, which is interesting in its own right).

Mitochondrial variant-calling pipeline from Ding et al. (2015)

In short, what they do is use a likelihood-based model to call genotypes within each sample at each locus. Using this method and filtering out sites with a minor allele frequency of <4%, they were able to achieve a false discovery rate of 2%! They then used their novel-ish method to look at associations between heteroplasmy and various phenotypes (e.g., BMI, weight, height). They found some robust associations wherein heteroplasmy was positively associated with waist circumference and waist-to-hip ratio, which “indicat[ed] an association with central fat distribution”.

But that’s not the main contribution of the paper. It’s really all about the software.

Ding, J., Sidore, C., Butler, T. J., Wing, M. K., Qian, Y., Meirelles, O., … Schlessinger, D. (2015). Assessing Mitochondrial DNA Variation and Copy Number in Lymphocytes of ~2,000 Sardinians Using Tailored Sequencing Analysis Tools. PLoS Genetics, 11(7), e1005306. doi:10.1371/journal.pgen.1005306


About Noah Snyder-Mackler

I'm a postdoctoral fellow in the department of Evolutionary Anthropology at Duke University. Broadly, I study non-human primate genetics and genomics. More specifically, I'm interested in the interaction between behavior, genotype, and gene expression in response to social stress.
This entry was posted in bioinformatics, genomics, howto, mutation, software, Uncategorized and tagged , . Bookmark the permalink.