What’s N50?

This is the first in a series of posts where we explain the N50 (Nx) metric, discuss the problems surrounding it, give solutions to those problems, and suggest an alternative N50 metric for transcriptome assemblies.

Most genome assembly papers include the N50 statistics these days. This measure is often being used to describe the ”completeness” of a genome assembly (and sometimes other assemblies). But what it essentially does, is telling you some information about the distribution of contig lengths.

Many people struggle initially to grasp the concept of N50, but we like to picture it like this. Imagine that you line up all the contigs in your assembly in the order of their sequence lengths (Fig. 1a). You have the longest contig first, then the second longest, and so on with the shortest ones in the end. Then you start adding up the lengths of all contigs from the beginning, so you take the longest contig + the second longest + the third longest and so on — all the way until you’ve reached the number that is making up 50% of your total assembly length. That length of the contig that you stopped counting at, this will be your N50 number (Fig. 1b).

1a. Contigs, sorted according to their lengths.

1b. Calculation of N50 using sorted contigs.

Fig. 1. Example of calculating N50 for a set of seven contigs. Here N50 equals 60 kbp.

Continue reading

Posted in genomics | Tagged , , | 3 Comments

Different ways to have sex, yet still be a weed

Baker (1955) noticed that when mates are lacking, the ability to undergo self-fertilization will greatly enhance colonization success.

Uniparental reproduction seems to be common in colonizing species, whether it’s from a continent to an oceanic island, during a biological invasion or during range expansion (Pannell et al. 2015).

Weeds by their very nature should be emblems of Baker’s Law. They have superior colonization abilities as they’re often found in places they shouldn’t be and maybe without conspecifics.

Yet, there’s very few empirical tests of Baker’s Law in weeds despite the development of Baker’s idea around weeds. (Maybe it’s not so surprising since Baker also referenced other organisms, such as mosses and ferns for which there’s not been much empirical work, but see Krueger-Hadfield et al. 2016)

How do sexual systems of species influence their weediness?

Continue reading

Posted in Coevolution, comparative phylogeography, evolution, natural history, phylogeography, population genetics, selection | Tagged , , , , , | Leave a comment

Small Molecules, Big Differences

Mary Latimer wrote this post as a final project for Stacy Krueger-Hadfield’s Science Communication course at the University of Alabama at Birmingham. She is a third year PhD student at UAB studying miRNAs and methionine restriction. Her hobbies include cats, netflix, and coffee. You can find out more about her research here.

Genes that deal with stress, growth, and even reproduction are tightly regulated by a number of mechanisms. Micro-RNAs (miRs) are one of these mechanisms.

Continue reading

Posted in bioinformatics, blogging, evolution, genomics, natural history, next generation sequencing, RNAseq | Tagged , , , , | Leave a comment

Friday Action Item: Try something new

An iris grows in Kitsilano. (jby)

On Fridays while the current administration is in office we’re posting small, concrete things you can do to help make things better. Got a suggestion for an Action Item? E-mail us!

Across much of the continental U.S., climate-changed spring is in the air. The president’s flagship legislative project is imploding. (Though it could really use an assist if you haven’t already). Has it really been more than two months? (Yes, it has.)

This is as good a time as any to remember that we’re in this for the long haul. One way to make that haul more bearable: vary the pace and the route. So your Action Item for this week is to try something new. Anything, really, to vary your routine and help freshen your energy. Find a new place for your next run, if you want to take my metaphor about pace and route literally. Break in a new analysis method. Fire up a new podcast while you code. (May I suggest this one? Or this one? Maybe this one? Or even this one?) Stop by a new coffee shop for your morning cup, or make a new recipe for dinner. Maybe try a colorful new (to you) reality show that starts a new season this very night, or have a quieter evening with a new book.

Go forth and refresh yourself, however you most like. Tomorrow’s a new day.

Posted in Action Item | Tagged , | Leave a comment

Polyploidy in the era of GBS

Ploidy, dear reader, is something that I think about literally all the time. It impacts every facet of my research from the field to the bench to the stats used to analyze data sets. It’s been simultaneously the greatest and the worst aspect underlying the majority of my work thus far.

Anyone who deals with things more complicated than a diploid understands the difficulty. We absolutely have to correctly distinguish individuals with different ploidy levels if we want to accurately genotype and estimate allele frequencies in population genetic studies. Diploidized haploids don’t reflect the true allele frequencies, nor do tetraploids that are treated as diploids.

State-of-the-art ploidy-ing techniques includes flow cytometry (FCM) that determines ploidy level by quantifying nuclear content. There are now high-throughput FCM techniques as well as methods for dried tissue. The downside is that these techniques require the right instrumentation and sufficient tissue. This might not be available if tissue is poorly preserved or limited.

Microsatellites have also been used to determine ploidy. Haploids should have one allele, diploids should have two alleles and polyploids more. However, ploidy detection depends entirely upon the population allelic richness, the numbers of loci and genotyping error.

Recently, hight throughput sequencing has joined the ploidiers toolkit. Sequence data can be used to  determine allelic copy numbers and ploidy levels, but often these approaches are for high coverage data sets (10x to 50x, for example, see Gompert and Mock 2017 that review these recent approaches).

Such high coverage isn’t the norm when GBS is used to assess population-level genetic variation. We sacrifice coverage for lots and lots of individuals (i.e., 2x). If we go as low as we can go for a GBS study, can we detect individual ploidy levels?

Continue reading

Posted in bioinformatics, evolution, genomics, haploid-diploid, Molecular Ecology, the journal, natural history, plants, speciation | Tagged , , , , | 1 Comment

Molting on the molecular level: how blue crabs become soft-shell crabs

Megan Roegner wrote this post as a final project for Stacy Krueger-Hadfield’s Science Communication course at the University of Alabama at BirminghamMegan spent her early years in Cape Town, South Africa playing in the tidal pools along the coast and developing a fascination for marine invertebrates. After moving to the United States, Megan attended UAB and is currently working on her PhD in the endocrinology of blue crabs. She hopes to spend her career working to preserve natural habitats and populations of marine invertebrates through better understanding of their interactions with their environments.

During molting season, blue crabs shed their hard outer shells, and, for less than a day, they are left with only a soft shell and a limp body. When hauled in by fisherman, they are immediately sold to eager foodies and restaurants alike.

With its outer shell so soft, the crab can be eaten in its entirety, with no complicated tools to claw your way to the meat.  Soft-shell crab is so well loved that demand is constantly increasing, and, unsurprisingly, supply is dwindling.  Overfishing has led to the collapse of many fisheries and a serious reduction in natural populations.

But, could there be better ways to control molting? Can we figure out what’s going on at the molecular level?

Continue reading

Posted in bioinformatics, blogging, conservation, domestication, evolution, genomics, natural history | Tagged , , , , , | Leave a comment

Hybridization and adaptive radiations

As an iconic system in evolutionary biology, I’ve always been interested in African cichlids and the origins of their diversity1. These cichlids represent an adaptive radiation; they’ve evolved rapidly from a single origin to exploit and speciate into open niches (for a general overview of adaptive radiations, see Losos 2010). In the Great Rift Valley in East Africa, over 2,000 species of cichlids are present. In Lake Victoria region alone, there are more than 700 species that have evolved in only 150,000 years. The amount of diversity and the rate at which it has been generated is really quite incredible.

The cichlid adaptive radiations have received lots of attention, but it is still somewhat unclear how enough genetic variation could be present to generate the huge numbers of species so rapidly. More generally, how could one species possess the necessary variation to result in such diverse niche use?

There are two related, but distinct hypotheses concerning hybridization may explain the generation of diversity in African Cichlids. The first is that hybridization between two species may seed an adaptive radiation and provide the genetic variation necessary for the subsequent radiation events. That is, this initial hybridization event serves as the source of the entire radiation. Another idea is the “syngameon hypothesis”, which argues that hybridization between closely related lineages (i.e., incipient species in the adaptive radiation) can generate genotypes that allow previously unoccupied fitness peaks to be reached. The occasional hybridization between radiating lineages can then continue to facilitate the occupation of novel fitness peaks (Seehausen 2004 provides a thorough discussion of these topics).

Continue reading

Posted in adaptation, evolution, genomics, next generation sequencing, population genetics, speciation | Tagged , , , | Leave a comment