Marmots, seasons, and climate change

I love when nostalgia for a project, place, or species intersects with a current interest, as happened this week for me with a paper by Cordes et al. 2020, about the contrasting effects of climate change on the seasonal survival of yellow-bellied marmots in the Colorado Rocky Mountains.

Continue reading

Posted in climate change, ecology, mammals | Leave a comment

Simple rules for organizing data in a spreadsheet

Most scientists collect and organize at least some data in spreadsheets, usually Excel or Google Sheets, despite the potential pitfalls of using such products (there are even archives of spreadsheet horror stories). The most commonly bemoaned problem in Biology, that of Excel converting some gene names to dates, even caused the HGNC (HUGO Gene Nomenclature Committee) to change the names of at least 27 gene this year to avoid this issue. No matter your feelings about spreadsheets, they are generally the first program students learn to use for creating a database of samples, recording data, or doing simple calculations. Furthermore, for people without extensive coding or experience, spreadsheets are the default. Fortunately, by following some simple guidelines, we can avoid most of the hassles as well as countless hours re-formatting data tables for analysis and endless confusion trying to decipher color-codes from 10 years ago.

This paper by Broman & Wu is from 2018, but it came to my attention this week and I have now added it to my canon of “Must read” literature for future students.

Karl W. Broman, lead author

Many of these tips seem obvious, but I’m guessing if you think back, you will recall an instance(s) where you (or a co-author) violated each of these tips and in retrospect knew you had erred. These days you are wiser but could probably use a refresher. This paper prevents the re-invention of the wheel during every PhD. I urge you to read the full paper, but here I’m providing the lightly edited (I combined some tips and re-arranged them a bit) cliffs notes. These guidelines, if implemented across the lab, also allow for easy hand-off and transfer of data between students and colleagues.

Tip 1 – Be consistent. In categorical variable codes, missing values, variable names, subject identifiers, dates, data layouts, and files names, both within and across spreadsheets. E.g., don’t use both “M” and “male”, don’t list the day first in some files and the month first in others. This one hits home – I once inherited a database of samples from a former French student who sometimes used the European date format and sometimes the American on both the sample label and within the database (they also labeled all variable names in French, but that’s another story!).

Tip 2 – Choose wisely. When choosing names or codes for variables, think about how your choice or a file format conversion will affect the analyses. E.g., don’t choose names with special characters and use underscores or hyphens instead of spaces. Think about how easy it will be to type out the variable name repeatedly in R code. It’s best to do this before you start collecting data. Also, choose wisely when it comes to how you represent any date variables.

Tip 3 – No empties allowed. Have a code that indicates a value is missing rather than the cell being intentionally left blank. This is especially important if you are continuing to collect data and are leaving cells blank to fill them in later! It’s also important for sorting data later. If you’re really fancy, you may have one missing code for data that wasn’t collected and another for data that is yet to be collected!

Tip 4 – One cell = One item. Each cell should contain only one piece of data, no more. The example given in the paper is position on a 96 well plate (e.g., A11 or B02), but I’ve also run into trouble with coding an individual as “adult_male” or “juvenile_female”. My solution is to keep the column with the “group” designation so I can easily visualize each group, but to add two columns, one for age and one for sex, for ease of sorting. And put ‘extra’ information, like units, into the header, a Notes column, or your ReadMe file (see Tip 6).

Tip 5 – Rectangles with one header row are gold. This honestly is pretty self-explanatory. See the figures below from the paper and imagine trying to analyze them.

Figure 5 from Broman & Wu 2018. Examples of spreadsheets with nonrectangular layouts.

Additionally, if you have bits and pieces of data scattered around, put them in separate files for ease of analysis later on. I corrected this very mistake today for a project I was just starting.

Tip 6 – Create a Data Dictionary (And Data ReadMe – For more information about ReadMe files, see here and here). Have a separate document of metadata that explains the overarching goal of the project, the data being collected with brief notes about the methods, and an explanation of what each variable in the spreadsheet is. These notes should include the variable name in the spreadsheet, a longer explanation of what the variable means, the measurement units if any, potential categories, etc. The article suggests separating the ReadMe and the data dictionary, but I advocate for having the information about variables both your data dictionary and your ReadMe file.

Tip 7 – Keep a raw version and back-up your data often. This tip feels obvious, but needs to be said. You should always keep a raw, protected version of your data that has no calculations included in the spreadsheet and contains all of the data. Save a copy and work within the copy. If you then exclude values or do calculations, you can save edited versions and even keep an explanation of the different versions in your ReadMe file, but always keep a ‘clean’ raw version that you don’t touch in case you need to go back. Similarly, save back-ups regularly and in different locations. If you don’t already do this one, stop reading and go do it, then come back.

Tip 8 – Do not color-code. I made this mistake a lot early on. Don’t. You will not remember what these highlighted cells represent or why some of the values are blue versus black when you re-open this file a year from now. Also, you can’t sort colored text or highlighted cells and these visualizing aids will usually be lost if you save in a different format or import the data into a different program. Instead, add Notes or a new variable to convey the information.

Now, you are empowered to use (and not abuse) spreadsheets for data collection! Go forth and collect all the data!

Additional Resources

A Guide to Data Management in Ecology & Evolution by the British Ecological Society Guides to Better Science


Karl W. Broman & Kara H. Woo (2018) Data Organization in Spreadsheets, The American Statistician, 72:1, 2-10, DOI: 10.1080/00031305.2017.1375989

Posted in data archiving, howto, methods | Leave a comment

Molecular Ecology and Molecular Ecology Resources are recruiting new Associate Editors

Molecular Ecology and Molecular Ecology Resources are looking for new Editorial Board members to join the journals as Associate Editors in the key subject areas below:

  • Eco-immunology/emerging diseases/disease resistance
  • Proteomics/protein evolution
  • Computer programs/statistical approaches
  • Environmental DNA/metabarcoding

Experience with genome assemblies would also be advantageous.  

Nominations and personal applications are welcome, and whilst scientific qualifications are paramount, we would particularly appreciate nominations and applications from suitably qualified researchers in underrepresented groups, including women, ethnic minority scientists, and scientists with disabilities, among others. Please email nominations/applications by October 15th, 2020 to with the following items:

  • Cover letter stating the reasons for your nomination, of if applying for yourself, your interest in the role and familiarity with the journals,
  • Abbreviated CV (Education, Publications, Outreach) if you have it.
Posted in community, Molecular Ecology, the journal, science publishing | Leave a comment

Genetic Rescue – Fitness and genomic consequences

As a PhD student studying the effects of genetic diversity overall and immunogenetic diversity specifically on survival and reproductive success in an endangered primate in captive and wild populations, I thought a lot about the potential effects of inbreeding and outbreeding depression. I read literally 100s of papers on the topic. Inbreeding depression describes the negative fitness effects that can occur in small populations when relatives breed with each other for multiple generations, thus genetic diversity is lost through genetic drift and negative alleles are expressed. Outbreeding depression, by contrast, is the negative consequence of breeding two genetically distinct populations leading to a loss of local adaptation. Concerns about outbreeding depression are one of the major theoretical limitations to re-introductions and attempts at ‘genetic rescues’ when small populations and/or endangered species might be suffering from inbreeding depression. For the most part, however, evidence of outbreeding depression has mostly been limited to plants and captive or laboratory studies. Earlier this year, however, Dr. Sarah Fitzpatrick and her co-authors documented an extremely cool example of genetic rescue in populations of wild Trinidadian guppies, contradicting the hypothesis about the potential for maladaptive gene flow in population introductions (Fitzpatrick et al. 2020).

Dr. Sarah Fitzpatrick, lead author.
Photo Credit:

Trinidadian guppies. Photo Credit:

After repeatedly sampling two isolated guppy ‘recipient’ populations (Figure 1A, dark blue circles, N < 100 individuals per population) in the Caigual and Taylor rivers in Trinidad, the authors introduced populations of guppies upstream (dashed red circles) of these recipient populations, in previously guppy-free areas. These trans-located guppies, from downstream populations (solid red circles), occasionally (or frequently!) migrated downstream into the recipient populations located either ~5m or ~700m from the introduction location. For ~8-10 guppy generations after the trans-location, the recipient populations have been monitored with mark-recapture to assess population size as well as individual overall genetic diversity, hybrid ancestry, lifespan, and reproductive success. Following the onset of immigration and subsequent gene flow, both recipient populations experienced nearly a 10-fold increase in population size, from less than 100 individuals to an estimated 1,000 individuals each (Figure 1B). Based on the hybrid index, which ranges from 0 to 1 based on the amount of native or immigrant ancestry of an individual respectively, of the generations, it’s clear that 10 generations after the first wave of immigration, the population consists almost entirely of admixed individuals (Figure 1C).

Figure 1 – Gene Flow Manipulation Experiments in Trinidad
(A) Map of the Guanapo River drainage. In 2009, guppies were translocated from a downstream high-predation locality (red) into two headwater sites (dashed red) that were upstream of native recipient populations in low-predation environments (dark blue). Unidirectional, downstream gene flow began shortly after the introductions, indicated by black arrows.
(B) Census sizes in Caigual (solid) and Taylor (dashed) following the onset of gene flow from the upstream introduction sites. Gray box indicates the time span in which all captured individuals were genotyped at 12 microsatellite loci.
(C) Temporal patterns of continuous hybrid index assignments throughout the first 17 months of the study (∼four to six guppy generations). Individuals from recipient populations prior to gene flow had a hybrid index = 0, and pure immigrant individuals had a hybrid index = 1. Hybrid indices were assigned using data from 12 microsatellite loci. Red arrows indicate the onset of gene flow.

Contradicting the predictions of outbreeding depression, individuals with intermediate to high (0.5-0.75) hybrid indices had the highest longevity and reproductive success in both locations and across sexes (Figure 2). Interestingly, although hybrids and pure immigrants had similar levels of genetic heterozygosity, hybrids had higher fitness, suggesting that increased genomic diversity alone does not explain the increased fitness and pointing towards a potential maintenance of locally adapted alleles.

Figure 2 – Relationships between Hybrid Index and Fitness
Fitness metrics (longevity and total lifetime reproductive success [LRS]) varied quadratically with hybrid index (0, pure recipient genotype; 1, pure immigrant genotype). Maxima of the quadratic functions are indicated by vertical dashed lines/diamonds; uncertainty in their positions is indicated by (horizontal) 95% confidence bars. Shading around regression lines displays approximate 95% confidence bands obtained through simulation.
(A and B) Longevity differed between males (red) and females (blue). Generally, females lived longer than males, and fish in (A) Caigual lived longer than those in (B) Taylor. In Taylor, male and female longevity had quadratic relationships with hybrid index that differed in magnitude but peaked at similar parameter estimates; this differed by sex in Caigual (A versus B).
(C and D) LRS varied quadratically with hybrid index, and this trend did not differ between males and females. Individuals from Taylor generally had lower LRS than Caigual (C versus D) and were more likely to not reproduce at all, especially those with recipient genotypes (hybrid indices near zero).

Pre-introduction, 95% and 96% of >12,000 genotyped SNPs were monomorphic in the Caigual and Taylor populations respectively and average nucleotide diversity was 0.01 in both populations (Figure 4b). 8-10 generations later, only 22 and 24% of SNPs are monomorphic and nucleotide diversity has increased to 0.21 and 0.22. Genome-wide average Fst between source and recipient populations also decreased from 0.29-0.31 to 0.01.

To determine if gene flow swamped locally adaptive variants, the authors identified 146 loci with allele frequencies in the pre-immigrant recipient populations that might indicate candidacy for locally adapted alleles. Post-immigration, although overall genome homogenization increased between immigrant and recipient populations, the authors found evidence for selective maintenance of some of the candidate alleles in the recipient populations in the form of an excess of pre-immigrant ancestry at these loci (Fig 4). Unfortunately, none of these candidate loci matched previously identified loci under selection nor were any gene ontology terms enriched, but they provide interesting potential targets for future investigation.

Figure 4 – Genomic Consequences of Gene Flow
New gene flow caused overall genomic homogenization, but candidate adaptive alleles were maintained at higher than expected frequencies.
(A) PCA plot showing overall population differentiation based on polymorphic SNP loci from the RAD-seq data.
(B) Comparison of nucleotide diversity patterns along linkage group two among pre-gene flow (dark blue) and post-gene flow (light blue) Caigual (solid) and Taylor (dashed) populations and the introduction source (red). Similar patterns were found across all 23 linkage groups.
(C) Distributions of ancestry-polarized deviations in candidate loci versus frequency-matched non-candidates for both populations. In each stream, the allele frequencies of the candidate loci were significantly closer to the headwater ancestral frequency compared to a set of frequency-matched non-candidates.

This study documents the phenomenon of genetic rescue in two multi-generational wild populations, showing that contrary to expectations, gene flow does not necessarily swam local adaptation, and actually can significantly increase fitness in the form of longevity and reproductive success, subsequently substantially increasing population size. Further, at laest some locally adapted loci appear to have been maintained in both Caigual and Taylor, despite a 10-fold difference in the number of immigrants to each population, suggesting a range of gene flow rates might still allow the maintenance of local adaptation, with extremely important and interesting implications for future conservation-based introduction efforts.


Fitzpatrick, S.W., G.S. Bradburd, C.K. Kremer, P.E. Salerno, L.M. Angeloni, W.C. Funk (2020) Genomic and fitness consequences of genetic rescue in wild populations. Current Biology 30: 517-522.e5.

Posted in conservation, genomics, hybridization | Tagged , , , , | Leave a comment

The Molecular Ecologist Podcast: What do you look for in a journal?

The Boston Public Library (Flickr: Little Koshka)

A new episode of The Molecular Ecologist Podcast is now out on In this episode, we turn to a question that every academic scientist has to answer at some point: How do you choose a scientific journal to receive your paper? Kelle Freel, Shawn Abrahams, Katie Grogan and Jeremy Yoder chat about what they like in a journal, what they consider when picking a publication venue for a new paper, and the various meanings of an “impact factor.”

Continue reading

Posted in career, howto, TME Podcast | Tagged , , | Leave a comment

Sparrows and spiders and aggression, oh my!

One of the major goals of evolutionary biology is to link phenotypic variation with specific genetic variation, yet for behavioral phenotypes in non-model species, this task remains daunting and generally elusive. Although behaviors are heritable and clearly acted upon by evolutionary forces, they are generally polygenic, flexibly expressed, and context-dependent. Two recent papers, however, accomplished this very thing, in white-throated sparrows (Zonotrichia albicolis; Merritt et al. 2020) and in a species of jumping spider from southeastern Asia (Portia labiata; Chang et al. 2020)!

Top left: Dr. Jennifer Merritt. Top Right: Dr. Chia-Chen Chang
Bottom left: White-striped and tan-striped morphs of the white-throated sparrow, Photo Credit Jennifer Merritt
Bottom right: White-mustached Portia Jumping Spider, Photo Credit Richard Ong on Project Noah

Continue reading

Posted in association genetics, birds, insects, next generation sequencing, RNAseq | Tagged , , , , | Leave a comment

A genomic march of the penguins

It’s undeniable that penguins are a marine representative of the charismatic megafauna group. I have an affinity for stuff we need microscopes to see, BUT I agree that penguins are cute (just LOOK at these National Geographic photos…they’re even in comics). I’m guessing that many of us have also watched “March of the Penguins”, although maybe you also were today years old when you learned the original French version was narrated in first-penguin by the stars of the show themselves in “La Marche de l’Empereur”.

Our hearts all melt a tiny bit when we see a fluffy baby chick waddle around on the ice. But. Have you ever contemplated how many different penguin species there are, where exactly they’re found on the globe and how they ended up where they currently reside? If you’re like me, (and don’t work on anything remotely related to penguins), you might not be well versed in the diversity of these flightless diving birds.

Continue reading

Posted in adaptation, association genetics, bioinformatics, birds, ecology, evolution, genomics, phylogeography | Leave a comment

Urban ecology, evolution, and racism

Occasionally, while reading the literature, you stumble across a paper that is so eloquent and beautiful that you are awestruck. Since that happened to me this weekend, today’s post is a call to you to go read the incredible synthesis and call to action written by Schell et al. in Science (2020) – The ecological and evolutionary consequences of systemic racism in urban environments. In this paper, the authors affirm that biologists working in urban environments must consider how racial oppression affects the biological change they study.

Lead author Dr. Christopher Schell.
For more info, visit his webpage –

Evolutionary biologists have increasingly become interested in how the environmental change due to urbanization leads to changes in the phenotypic, genetic, and species make-up of urban ecosystems. Indeed, between 1965 and 1989, only 124 papers with the words “Urban ecology” in the abstract were published according to a quick non-exhaustive search of Web of Science (mean = 5.0 papers per year; performed 8-31-2020). However, from 1990 until 2019, the rate of publication increased exponentially to over 1,000 papers in 2019 alone.

Continue reading

Posted in ecology, evolution | Tagged , , , | Leave a comment

A decade of The Molecular Ecologist

Conveniently, I made a montage-header for The Molecular Ecologist back when we took a stab at crowdfunding in 2016.

I recently took a look through the “Archives by month” drop-down in our right-hand sidebar and discovered that it goes all the way back to July 2010. Which means The Molecular Ecologist had its tenth anniversary this very month — specifically back on July 11, an even decade since Brant Faircloth kicked off the blog with a rundown of essential (Python-centric) bioinformatic tools.

Given that it snuck up on us, and in the middle of the summer, and in the middle of this summer, we don’t have any kind of big event planned. But I didn’t want to let the month close out without marking the occasion. So here’s a rundown of some major events in the history of this fine blog:

Continue reading

Posted in community, housekeeping | Tagged | Leave a comment

Serendipitous history in the microbial making

It’s been over 100 years since the Dutch Microbiologist Martinus Willem Beijerinck theorized that microbes could oxidize manganese to generate energy for growth. Last week, the first evidence for this theory was published, and you might be surprised about from where these fascinating microbes hail.

Continue reading

Posted in bioinformatics, ecology, genomics, microbiology, transcriptomics | Tagged , , | Leave a comment