Easily aggregate bioinformatic sample output with one tool

Today I’m going to write about one of my favorite bioinformatic tools, MultiQC. If you’ve used it, you know why, and if you haven’t, prepare to be amazed.

Many bioinformatic software produce output on a per-sample basis. That is, you may be quality-filtering, trimming, blasting, and mapping your sequence reads to a genome separately for each and every sample. And if you have more than 3 samples (who hasn’t?) going through the output for all samples can become quite tedious and time-consuming.

A single paired-end Illumina MiSeq run can yield 386 separate samples, and bigger genome, transcriptome, and amplicon projects can these days aggregate hundreds or thousands of separate samples.

This is where MultiQC comes in. This tool will first take all the output files from your favorite quality-screening program, and aggregate the results into one simple and pretty report. With FastQC, for example, you normally retrieve a couple of quality plots per sample. MultiQC uses its magic (almost) to compile all samples into combined plots.


Example plot from running FastQC followed by MultiQC. Each line represents one sample.


Continue reading

Share
Posted in bioinformatics, next generation sequencing, software | Tagged , , | Leave a comment

Still alive from #Evol2017 – Tuesday highlights

My phone battery bought the farm at the entrance to the Oregon Zoo, so in lieu of photos from the terrific final Super Social, here’s someone else’s image of one of the zoo’s bald eagles, looking concerned. (Flickr: Tamara)

A subset of the Molecular Ecologist team is attending this year’s Evolution meeting in Portland, Oregon. As part of our coverage of the meeting, we’ve been recapping the highlights of each day here on the blog, and occasionally previewing upcoming presentations. You can find all of the TME contributors on Twitter using the sidebar on the right or compiled in a handy Twitter list here, check in on meeting news using the hashtag #Evol2017>. You can also view a still-expanding list of video recordings of presentations at the meeting on the Evolution 2017 YouTube channel.

Continue reading

Share
Posted in conferences, evolution | Tagged | Leave a comment

Live from #Evol2017 – Monday highlights

A subset of the Molecular Ecologist team is attending this year’s Evolution meeting in Portland, Oregon. As part of our coverage of the meeting, we will recapping the highlights of each day here on the blog, and occasionally previewing upcoming presentations. You can find all of the TME contributors on Twitter using the sidebar on the right or compiled in a handy Twitter list here; follow along with all meeting news using the hashtag #Evol2017.

Jeremy

Joachim Hermisson — Footprints of Adaptive Introgression

Identifies a “volcano” profile of nucleotide diversity flanking an adaptive locus that introgressed from another species (or population, I think), and some variation in the shape of the volcano depending on the fitness effects of linked variants. A formal method to test for these effects is provided in VolcanoFinder, which does not appear to be released yet (or is not visible to Google, anyway), so that’s a publication to watch out for.

Miranda Sinnott-Armstrong — Evolutionary transitions in fruit colors with respect to climate

Cool first results from a big dataset of data on fruit morphologies and color in plant communities from about a hundred different sites worldwide. It looks like the diversity of fruit color is greater in the tropics than at higher latitudes (though the southern hemisphere looks more “tropical” than the north, which is odd), and frequency of transitions to particular fruit colors may be correlated with transitions to particular climate regimes.

Ailene MacPheson — Finding disease genes in the face of the Red Queen

Genome-wide association is pretty dependent on variation in the population studied; if a locus isn’t variable, by definition GWAS  can’t identify it as associated with a given trait. In the case of disease resistance, we know that negative frequency-dependent selection can create sequential sweeps that remove variation from host and pathogen populations at resistance and virulence loci. MacPherson develops a method of GWAS based on both host and pathogen allele frequencies that can maybe overcome this limitation. (Disclaimer: She’s at UBC, my current institution, and is working with a member of my PhD committee)

Stacy

Sally Chang – Genomic signatures of asexual and sexual reproduction within the colonial hydrozoan Ectopleura larynx

Anything with clonality is pretty much guaranteed to be a major draw for me! Sally is one of the first talks I’ve seen that has used RClone to look at clonality among colonies of the cnidarian Ectopleura. RClone is the fantastic R update to GenClone (the old Excel-like macro that required individual analyses per population!!) led by the work of Sophie Arnaud-Haond’s lab. It’s something that we’ve struggled with … with so many SNPs, you’re not going to get matching genotypes in the way you would with microsatellites. So, where do you draw that line? What’s a clonal lineage? Interesting to see Sally’s work on using called genotypes (20x per our subsequent meeting over coffee trading war stories of working with clonal species) and looking at super close genotypes (likely MLLs), different ones that are clearly the product of sex, and those that so different … totally unrelated polyps? Excited for her work to come out and more studies using RClone and SNPs!

Maurine Neiman – Genomic consequences of asexuality

Maurine described a ton of work out of their group on the snail Potamopyrgus antipodarum. It’s native to New Zealand lakes and populations vary in frequency of sexual and asexual. This snail could be a lovely, modern model for understanding when to have sex and when to maybe be asexual. The coolest thing I found was the fact that radical changes persist for a whole lot longer in asexuals!

Sally Otto – Evolution of sex, evolution of ploidies (SSE Presidential address)

… need I say anymore?

Finally, I neglected to add an awesome poster from Sunday night in my wrap up yesterday!

Kazuhiro Bessho presented work on the evolution of energy from haploid gametophytes to diploid sporophtyes. He used my favorite red algae as a model. Red algae are thought to have poor fertilization success since the female gamete doesn’t disperse at all and male gametes have no flagella. They found if the female totally controls the development of the sporophyte, then an ESS exists and females can maximize their fitness. In contrast, if there is increasing paternal control and fewer sporophytes, then parental care may be favored … this is an excellent model and I look forward to talking more to Bessho-San about this in terms of red algal fertilization success that is actually quite high!

Ethan

Leonardo Campagna – Repeated divergent selection on pigmentation genes in a rapid finch radiation

A number of bird speciation studies from the last few years have shown that recently diverged lineages differ in little beyond genes associated with plumage. Campagna’s talk contributed to this growing consensus, showing that in a species complex of South American birds known as the capuchino seedeaters, divergence peaks in genome scans from different pairwise comparisons are primarily associated with coloration.

Marc Tollis – The tuatara genome sheds light on phylogenetics and rates of evolution during the amniote radiation

It turns out the tuatara genome is about as cool as you’d expect it to be.

Share
Posted in blogging, conferences, evolution | Leave a comment

Live from #Evol2017 – Sunday Highlights (and a smidge of Saturday too!)

A subset of the Molecular Ecologist team is attending this year’s Evolution meeting in Portland, Oregon. As part of our coverage of the meeting, we will recapping the highlights of each day here on the blog, and occasionally previewing upcoming presentations. You can find all of the TME contributors on Twitter using the sidebar on the right or compiled in a handy Twitter list here; follow along with all meeting news using the hashtag #Evol2017.

Team meeting at Teote

Stacy

As I’m taking care of the Sunday highlights, I am taking the liberty of inserting my Saturday highlights as well! In a separate post that will be written soon, I’ll regale the TME readers with the story of the stolen field gear …

Saturday

Carlos Spano et al. – Genomic divergence in sea anemones and it’s implications for species delimitation. They asked they dreaded question of what is a species for a sea anemone … there’s lots of plasticity and very simple morphologies, so what do you use to delimit species in these taxa? As my lab is making a foray into sea anemones soon with the addition of a post-doc (see more in my Sunday highlight), I thought this was an interesting foil to the current work on seaweed species delimitation we’ve been doing. It’s hard in these taxa, that’s certainly an understatement! I found it interesting that some synonymies were based on a single specimen without genomic tools! Another interesting point was raised in the questions in the frequency of asexual reproduction across the Anthothoe complex and what role that has in speciation. The work is still on-going, but what role does asexual reproduction play in divergence?

Sarah Jacobs and David Tank – species delimitation in the grey zone. Sarah stressed over and over again that we need (1) multiple lines of evidence, (2) multiple independent approaches and (3) integrative studies! They’ve been working on Castilleja and had to fall back on Sanger sequencing (which incidentally I found somewhat sad that older techniques are somewhat begrudgingly returned to even if they could be an appropriate tool!). While conducting power analyses, they found that one technique got the species right, the other technique they used, not so much. If you stumble across a Castilleja species, the first thing an expert will ask you is (1) where did you sample it, then (2) what was the ecological habitat and then (3) morphological characteristics. So are some of these conflated with one another?

Sunday

Will Ryan – while it might be shameless to highlight Will’s talk as he is joining my nascent lab in September, his stuff is just too cool! He presented work from his dissertation about an invasive sea anemone Diadumene lineata. It has temperature dependent fission. So, it undergoes both sexual and asexual reproduction. A bifurcation of a life cycle into these two reproductive modes must have some pretty awesome eco-evolutionary consequences! Along the east coast of the US, there’s a cline in fission rates that has an underlying genetic component!

Kathryn Turner – invasions in space and time using herbarium specimens. Herbaria truly are little nuggets of data … not just genomic, but also geographic location and a time capsule of info before an invasion … in other words, contemporary samples from sites that might have contributed to the invasion! Super cool to see what happens with the genomic work they’re doing with these samples! And, if you can help Kathryn out by looking for samples, get in touch (@KTInvasion).

Lua Lopez – herbarium data to track adaptation through time! In a time when herbaria and museum collections are losing funding, these studies are all the more important. They’re tracking adaptation in samples from 1820-2010! Funnily enough, the 1820 sample had tons of DNA, but it didn’t matter if you were an old sample or a new sample (i.e., 2010), the herbarium sample DNA is degraded and fragmented! Regardless, herbariums samples are “awesome for temporal studies!”

Jeremy

Alexandra Fraik — Characterizing potential adaptations of Tasmanian devil populations in the face of a transmissible cancer — Analysis of a sequence capture dataset from more than 3,000 Tasmanian devils, using both F[ST] outlier-type methods and genotype-environment associations to identify loci associated with incidence of the transmissible devil facial tumor. A particularly nice feature is that data’s included from before and after the arrival of the DFT in many populations, so there’s a basis for direct tests for selected loci.

Thomas Nelson — The timescales of selection in stickleback: Why rapid adaptation takes millions of years — Nice deep dive into variation in coalescent time across the stickleback genome, in the context of two transitions to freshwater from the same anadromous population. Variants that helped fuel adaptation to freshwater are older than the rest of the genome, and may have been maintained in the ancestral population by past moves into freshwater and back. (I swear, stickleback seem to make the jump to freshwater more often than I’ve crossed the US-Canada border.)

Rob

Paul Decena — Genome size evolution in neotropical salamanders.

I enjoyed this tidy talk by Paul Decena, who started with a simple enough idea: salamanders have huge genomes compared to other vertebrates, we know that cell size is correlated with genome size, so what do the genomes of the smallest salamanders look like? The answer might not surprise you. There is a positive relationship between salamander body size and genome size. However, those tiny genomes are still many times bigger than even the largest mammal genomes, so these miniaturized amphibians have distinct morphological features that allow for the minimum size of certain features that just can’t functionally get any smaller (like eyes!).

Jonathan Puritz — Expressed Exome Capture Sequencing (EecSeq): a method for cost-effective exome sequencing of non-model organisms

A packed room was eager to learn about EecSeq, an exome sequencing approach developed by Jonathan Puritz that skips the generation of transcriptomes/genomes in order to isolate exome data. The trick to this pipeline? A Jonathan said, “I can’t tell you the details because I don’t want someone to steal my idea. But if you come and find me after the talk and pinky swear not to tell, we can talk about it.”

Katie

Katie Wagner and Brian Smith — When are species delimitation methods misleading in the study of macroevolutionary patterns? and When are species delimitation methods necessary for studying macroevolutionary patterns?

These talks were presented back to back in the symposium titled “A debate of conceptual issues surrounding genetic-based species delimitation in the genomic era.” Although they were framed as opposing sides of a debate, the two talks were largely complementary (Katie Wagner prefaced her talk by telling the audience that debate was her least favorite class in high school). Both speakers agreed that species delimitation tools can help us understand genetic patterns on a landscape — including patterns above and below the species level.

Catalina Palacios — We’re one, but we’re not the same: Shallow evolutionary divergence between two andean hummingbirds

At the end of the day I stumbled into this talk by accident — and I was glad I did! Catalina and colleagues used an impressive suite of analyses to study two beautiful and charismatic hummingbird species from the Andes. Although the two species have notably different iridescent plumage coloration, Catalina found evidence of niche overlap and rampant mitochondrial haplotype sharing between them. Nuclear data, however, supported a distinct species split. She concluded that the two species had high levels of gene flow during the early stages of divergence.

Share
Posted in blogging, conferences, evolution | Leave a comment

Live from #Evol2017 – Saturday Highlights

A subset of the Molecular Ecologist team is attending this year’s Evolution meeting in Portland, Oregon. As part of our coverage of the meeting, we will recapping the highlights of each day here on the blog, and occasionally previewing upcoming presentations. You can find all of the TME contributors on Twitter using the sidebar on the right or compiled in a handy Twitter list here; follow along with all meeting news using the hashtag #Evol2017.

Mt. Hood and the St. Johns Bridge in Portland, during decidedly cooler weather than conference attendees are currently experiencing. Flickr: Mark

To get things going, these were some of our favorite presentations from Saturday, June 24rd:

Jeremy

Sarah Fumagalli: Hidden Benefits Aid the Evolution of Altruism in Small Populations of Unrelated Individuals

— Builds a model of altruistic behavior with stochastic selection and explicit tracking of trait-environment covariation, and recovers a bunch of classic evolution-of-altruism results from first principles. What’s really cool, though, is that there’s a possibility that increased altruism can evolve by chance in a small population of unrelated individuals, and kickstart selection for more altruism.

Melissa Wilson Sayres: Teaching undergraduate life sciences majors

— Surveyed biology faculty to zero in on what “core competencies” of bioinformatics should be part of a standard life science curriculum. Notably, lots of folks ranked statistics as important, even though you’d think that’s a basic component of every bio program already. And lots of respondents ranked command-line skills as important. A sample of syllabi found a lot of variation in what competencies are actually covered in bioinformatics courses, though.

Rob

Jeet Sukumaran: Species Delimitation under the Multispecies Coalescent: Conflating Populations with Species in the Grey Zone

— I think that if you begin a talk by reflecting on a recent paper of yours, you are (brazenly?) making a big assumption that the paper has been read by a majority of your audience. In this case, that assumption was completely justified as a full room of other scientists nodded their heads as we were reminded that multispecies coalescent theory is actually concerned with population structure, not whatever species concept you overlay on top of it. A simple message that resonated, just like the corresponding PNAS paper from late last year.

Greg Haenel: Variation of mitochondrial function in hybrid Tree lizards: assessing the role of differential gene expression

— Cases of mitochondrial introgression across a wide variety of systems continue to pop up in session after session, but nailing down the mechanisms that drive this introgression is a trickier problem. Greg Haenel offered a neat system to investigate further, tree lizards that have hybridized across a stark temperature gradient. The expression analyses show differences associated with those hybrid mitochondria, but the work to tease out how these differences potentially related to interacting mitochondrial and nuclear DNA is just taking off.

Ethan

Rebekah Rogers: Excess of genomic defects in a woolly mammoth on Wrangel island

— Using the high-quality genomes from Wrangel Island and mainland Siberia published by Eleftheria Palkopoulou et al. in 2015, Rogers explored the genomic defects present in an individual from the last surviving mammoth population. In her standing-room-only talk, Rogers kept one foot firmly rooted in population genetic theory (“to tell you what is possible”) while speculating on the biology of some very mutation-burdened mammoths (shiny pelts!).

Harry Greene: Teaching natural history in the Anthropocene: some rules of engagement

— Given as part of an American Society of Naturalists symposium (“Natural history as the inspiration for scientific inquiry: Stories and tools for teaching”), Greene’s talk focused on on a question I spend a lot of time thinking about: what concepts of “wilderness” mean to a biologist, especially in our current human-dominated epoch. In Greene’s eyes, this has less to do with traditional notions of what qualifies “pristine” or “disturbed” and more with how intact ecological and evolutionary processes are. I like this formulation because it highlights the role of humans as participants (versus spectators) in evolution, but still provides a moral compass for addressing conservation questions.

Share
Posted in Uncategorized | Leave a comment

When less might be more: The evolution of reduced genomes

The advent of affordable genome sequencing has provided us with a wealth of data. Researchers have sequenced everything from Escherichia coli (4.6 Mbp genome size), to sea urchins (810 Mbp), chimpanzees (3.3 Gbp), and humans (3.2 Gbp). Then there are the massive genomes, which have been identified, including that of the rare Japanese flower (Paris japonica) with a genome of 149 Gbp. But, what does that mean? Maybe it’s more interesting to switch our focus from the large and in charge genomes to those of the small free living prokaryotes who have taken the opposite route.

Continue reading

Share
Posted in adaptation, Coevolution, evolution, genomics, microbiology, population genetics, selection | Tagged , , , | Leave a comment

Molecular Ecologists at #Evol2017 —  see you in Portland!

The Portland skyline and Mt. Hood, as seen from the Portland Japanese Garden. (Flickr: Alan)

Evolution 2017 — the joint annual meeting of the American Society of Naturalists, the Society of Systematic Biologists, and the Society for the Study of Evolution — is already underway in Portland, Oregon, and it’s looking like a terrific week of science already. The program kicked off today with a symposium in honor of Joe Felsenstein, and gets fully underway tomorrow with a day of workshops capped by the traditional public outreach lecture, which will be given by Ann Reid, the Executive Director of the National Center for Science Education. Regular presentation sessions begin bright and early Saturday morning, and carry on through Tuesday.

Continue reading

Share
Posted in community, conferences | Tagged , | Leave a comment