It was the Ides of March in 2020 when I moved from California to Europe. Thanksgiving marks March 271st. I was still a postdoc in Jonathan Eisen’s lab at UC Davis and my contract would have ended in the end of August 2020. In March 2020, my husband and I were in the process of booking a container to bring our belongings through the Panamá Canal to Europe. He was applying for jobs in Germany, I had already an offer, and we were looking at schools for our children. I was in the middle of analyzing my data I had collected during one of the many field trips to Central America in the months before, when my mom from Switzerland called and told us we have to come now. Switzerland is closing its borders in the next few days and many European countries are already closed!
Infrastructure to make genetic data widely available for research beyond its initial publication has been a theme of the genomics revolution, from GenBank to the Sequence Read Archive. For molecular ecologists, though, genetic data is only half of our field — the other half is the ecological context in which that data is collected. This month, Molecular Ecology Resourceshighlights an initiative to bring that ecological context to genetic data archiving: the Genomic Observatories Metadatabase, or GEOME.
Led by Cynthia Riginos at the University of Queensland, Eric Crandall at Penn State, Libby Liggins at Massey University, and Michelle Gaither at the University of Central Florida, the GEOME collaborators present the case for creating yet another data deposition service: although there are a number of established databases for public deposition of genetic and ecological data, no one repository linked both types together. GEOME, which launched in 2017, offers a single metadata framework to link DNA sequence or marker data to sample locality and ecological measurements.
GEOME allows researchers to create records linked to sequence data they’ve already posted to a public repository — or, now, to upload samples to the International Nucleotide Sequence Data Collaboration SRA alongside ecological data through a single unified portal. Datasets are then searchable through the GEOME website, which includes multiple levels of search control alongside a useful map visualization, or through a new R package that interacts with the GEOME API.
Life as we knew it came to a screeching halt back in March. Almost a year ago, how is that possible??? Yet, at the same time it feels like several lifetimes have passed …
At a recent editorial meeting, we were talking about TME posts and in the past, I’ve written about fieldwork. I always felt fortunate to be able to travel to far flung places, but I don’t think I truly appreciated how much being out in the field really meant to me. In between bouts of existential dread and complete overwhelmed-ness over the last 9-odd months, I’ve realized how much I took for granted. Fieldwork was one thing that was simply part of the fabric of my life.
Our time in the field entails long days of driving hither and yon, sampling (often in hot, humid weather or the freezing rain – we like extremes I guess), and then processing late into the night. At some point Cher, Céline, or some other guilty pleasure musician make an appearance to get us through the slog – whether in the lab or on the road. Sometimes, we’re processing samples in a nice lab. Other times, we’re sitting backwards on a toilet in a Motel6 using the tank lid as a makeshift bench. We eat too much McDonald’s, go back for more and regret it immediately. Yet, these are the times of the year I find myself anxiously awaiting, counting down the days on my calendar until we are on the road.
The Molecular Ecologist is seeking two new regular contributors for 2021! Join us in blogging about “ecology, evolution, and everything in between.”
Ideal candidates should have expertise and experience in our core topic, the use of genetic data to understand the past and future of the living world. We’re particularly interested in senior graduate students, postdoctoral researchers, and other working scientists who can discuss basic science on a level that engages research biologists, as well as explaining fundamental molecular ecology concepts to the general public. The two contributors in the 2021 cohort will receive small stipends for their first year with the blog, in exchange for committing to posting on a monthly basis, helping to manage social media for TME — either our Twitter account or our presence on Facebook — and contributing to the Molecular Ecologist Podcast.
In addition to the direct compensation, blogging for The Molecular Ecologist can be an excellent way to hone familiarity with current molecular ecology research, establish connections within the scientific community, and build a portfolio of science writing for a broader audience. In light of this, we are particularly interested in applications from candidates whose racial, ethnic, sexual, or gender identities are underrepresented in science careers.
To apply, please e-mail Jeremy Yoder at email@example.com with a brief cover letter explaining (1) why you want to write for The Molecular Ecologist and (2) what topics you would write about for the site, along with (3) an appropriate sample of your writing. Applications should be received by the end of the day on 11 December, 2020 to ensure consideration.
Who’s in charge of a symbiotic mutualism? You might think the host organism, whose body is the venue for an exchange of nutrients or services with a microbial symbiont, is running the show, able to evict or punish symbionts that don’t play nice. However, there are many examples of hosts making do with symbionts that aren’t particularly good partners, and some evolutionary theory has suggested that competing symbionts can gain the upper hand. Results from an evolutionary experiment recently reported in the journal Science lend support to the host-in-the-driver-seat view, though — bacterial symbionts selected by five generations of hosts evolved to be better mutualists.
Following up on this being our tenth year of blogging operations, we thought it was past time to check in with you, our readers. To that end, we’ve put together a brief survey about how you read The Molecular Ecologist, what kinds of posts you follow us for and what you’d like to see more of, and who you are — in terms of career stage and scientific interests. There’s also an open-ended suggestion box, to tell us what we should have asked about but didn’t think to.
In total it should take less than ten minutes, and if you’ve got the time to spare, it’ll be very helpful. You can fill the survey form in right here on the blog, or follow this link to the Google Form. Thanks in advance!
I love when nostalgia for a project, place, or species intersects with a current interest, as happened this week for me with a paper by Cordes et al. 2020, about the contrasting effects of climate change on the seasonal survival of yellow-bellied marmots in the Colorado Rocky Mountains.
Most scientists collect and organize at least some data in spreadsheets, usually Excel or Google Sheets, despite the potential pitfalls of using such products (there are even archives of spreadsheet horror stories). The most commonly bemoaned problem in Biology, that of Excel converting some gene names to dates, even caused the HGNC (HUGO Gene Nomenclature Committee) to change the names of at least 27 gene this year to avoid this issue. No matter your feelings about spreadsheets, they are generally the first program students learn to use for creating a database of samples, recording data, or doing simple calculations. Furthermore, for people without extensive coding or experience, spreadsheets are the default. Fortunately, by following some simple guidelines, we can avoid most of the hassles as well as countless hours re-formatting data tables for analysis and endless confusion trying to decipher color-codes from 10 years ago.
This paper by Broman & Wu is from 2018, but it came to my attention this week and I have now added it to my canon of “Must read” literature for future students.
Many of these tips seem obvious, but I’m guessing if you think back, you will recall an instance(s) where you (or a co-author) violated each of these tips and in retrospect knew you had erred. These days you are wiser but could probably use a refresher. This paper prevents the re-invention of the wheel during every PhD. I urge you to read the full paper, but here I’m providing the lightly edited (I combined some tips and re-arranged them a bit) cliffs notes. These guidelines, if implemented across the lab, also allow for easy hand-off and transfer of data between students and colleagues.
Tip 1 – Be consistent. In categorical variable codes, missing values, variable names, subject identifiers, dates, data layouts, and files names, both within and across spreadsheets. E.g., don’t use both “M” and “male”, don’t list the day first in some files and the month first in others. This one hits home – I once inherited a database of samples from a former French student who sometimes used the European date format and sometimes the American on both the sample label and within the database (they also labeled all variable names in French, but that’s another story!).
Tip 2 – Choose wisely. When choosing names or codes for variables, think about how your choice or a file format conversion will affect the analyses. E.g., don’t choose names with special characters and use underscores or hyphens instead of spaces. Think about how easy it will be to type out the variable name repeatedly in R code. It’s best to do this before you start collecting data. Also, choose wisely when it comes to how you represent any date variables.
Tip 3 – No empties allowed. Have a code that indicates a value is missing rather than the cell being intentionally left blank. This is especially important if you are continuing to collect data and are leaving cells blank to fill them in later! It’s also important for sorting data later. If you’re really fancy, you may have one missing code for data that wasn’t collected and another for data that is yet to be collected!
Tip 4 – One cell = One item. Each cell should contain only one piece of data, no more. The example given in the paper is position on a 96 well plate (e.g., A11 or B02), but I’ve also run into trouble with coding an individual as “adult_male” or “juvenile_female”. My solution is to keep the column with the “group” designation so I can easily visualize each group, but to add two columns, one for age and one for sex, for ease of sorting. And put ‘extra’ information, like units, into the header, a Notes column, or your ReadMe file (see Tip 6).
Tip 5 – Rectangles with one header row are gold. This honestly is pretty self-explanatory. See the figures below from the paper and imagine trying to analyze them.
Additionally, if you have bits and pieces of data scattered around, put them in separate files for ease of analysis later on. I corrected this very mistake today for a project I was just starting.
Tip 6 – Create a Data Dictionary (And Data ReadMe – For more information about ReadMe files, see here and here). Have a separate document of metadata that explains the overarching goal of the project, the data being collected with brief notes about the methods, and an explanation of what each variable in the spreadsheet is. These notes should include the variable name in the spreadsheet, a longer explanation of what the variable means, the measurement units if any, potential categories, etc. The article suggests separating the ReadMe and the data dictionary, but I advocate for having the information about variables both your data dictionary and your ReadMe file.
Tip 7 – Keep a raw version and back-up your data often. This tip feels obvious, but needs to be said. You should always keep a raw, protected version of your data that has no calculations included in the spreadsheet and contains all of the data. Save a copy and work within the copy. If you then exclude values or do calculations, you can save edited versions and even keep an explanation of the different versions in your ReadMe file, but always keep a ‘clean’ raw version that you don’t touch in case you need to go back. Similarly, save back-ups regularly and in different locations. If you don’t already do this one, stop reading and go do it, then come back.
Tip 8 – Do not color-code. I made this mistake a lot early on. Don’t. You will not remember what these highlighted cells represent or why some of the values are blue versus black when you re-open this file a year from now. Also, you can’t sort colored text or highlighted cells and these visualizing aids will usually be lost if you save in a different format or import the data into a different program. Instead, add Notes or a new variable to convey the information.
Now, you are empowered to use (and not abuse) spreadsheets for data collection! Go forth and collect all the data!
Experience with genome assemblies would also be advantageous.
Nominations and personal applications are welcome, and whilst scientific qualifications are paramount, we would particularly appreciate nominations and applications from suitably qualified researchers in underrepresented groups, including women, ethnic minority scientists, and scientists with disabilities, among others. Please email nominations/applications by October 15th, 2020 to firstname.lastname@example.org with the following items:
Cover letter stating the reasons for your nomination, of if applying for yourself, your interest in the role and familiarity with the journals,
Abbreviated CV (Education, Publications, Outreach) if you have it.
As a PhD student studying the effects of genetic diversity overall and immunogenetic diversity specifically on survival and reproductive success in an endangered primate in captive and wild populations, I thought a lot about the potential effects of inbreeding and outbreeding depression. I read literally 100s of papers on the topic. Inbreeding depression describes the negative fitness effects that can occur in small populations when relatives breed with each other for multiple generations, thus genetic diversity is lost through genetic drift and negative alleles are expressed. Outbreeding depression, by contrast, is the negative consequence of breeding two genetically distinct populations leading to a loss of local adaptation. Concerns about outbreeding depression are one of the major theoretical limitations to re-introductions and attempts at ‘genetic rescues’ when small populations and/or endangered species might be suffering from inbreeding depression. For the most part, however, evidence of outbreeding depression has mostly been limited to plants and captive or laboratory studies. Earlier this year, however, Dr. Sarah Fitzpatrick and her co-authors documented an extremely cool example of genetic rescue in populations of wild Trinidadian guppies, contradicting the hypothesis about the potential for maladaptive gene flow in population introductions (Fitzpatrick et al. 2020).
After repeatedly sampling two isolated guppy ‘recipient’ populations (Figure 1A, dark blue circles, N < 100 individuals per population) in the Caigual and Taylor rivers in Trinidad, the authors introduced populations of guppies upstream (dashed red circles) of these recipient populations, in previously guppy-free areas. These trans-located guppies, from downstream populations (solid red circles), occasionally (or frequently!) migrated downstream into the recipient populations located either ~5m or ~700m from the introduction location. For ~8-10 guppy generations after the trans-location, the recipient populations have been monitored with mark-recapture to assess population size as well as individual overall genetic diversity, hybrid ancestry, lifespan, and reproductive success. Following the onset of immigration and subsequent gene flow, both recipient populations experienced nearly a 10-fold increase in population size, from less than 100 individuals to an estimated 1,000 individuals each (Figure 1B). Based on the hybrid index, which ranges from 0 to 1 based on the amount of native or immigrant ancestry of an individual respectively, of the generations, it’s clear that 10 generations after the first wave of immigration, the population consists almost entirely of admixed individuals (Figure 1C).
Contradicting the predictions of outbreeding depression, individuals with intermediate to high (0.5-0.75) hybrid indices had the highest longevity and reproductive success in both locations and across sexes (Figure 2). Interestingly, although hybrids and pure immigrants had similar levels of genetic heterozygosity, hybrids had higher fitness, suggesting that increased genomic diversity alone does not explain the increased fitness and pointing towards a potential maintenance of locally adapted alleles.
Pre-introduction, 95% and 96% of >12,000 genotyped SNPs were monomorphic in the Caigual and Taylor populations respectively and average nucleotide diversity was 0.01 in both populations (Figure 4b). 8-10 generations later, only 22 and 24% of SNPs are monomorphic and nucleotide diversity has increased to 0.21 and 0.22. Genome-wide average Fst between source and recipient populations also decreased from 0.29-0.31 to 0.01.
To determine if gene flow swamped locally adaptive variants, the authors identified 146 loci with allele frequencies in the pre-immigrant recipient populations that might indicate candidacy for locally adapted alleles. Post-immigration, although overall genome homogenization increased between immigrant and recipient populations, the authors found evidence for selective maintenance of some of the candidate alleles in the recipient populations in the form of an excess of pre-immigrant ancestry at these loci (Fig 4). Unfortunately, none of these candidate loci matched previously identified loci under selection nor were any gene ontology terms enriched, but they provide interesting potential targets for future investigation.
This study documents the phenomenon of genetic rescue in two multi-generational wild populations, showing that contrary to expectations, gene flow does not necessarily swam local adaptation, and actually can significantly increase fitness in the form of longevity and reproductive success, subsequently substantially increasing population size. Further, at laest some locally adapted loci appear to have been maintained in both Caigual and Taylor, despite a 10-fold difference in the number of immigrants to each population, suggesting a range of gene flow rates might still allow the maintenance of local adaptation, with extremely important and interesting implications for future conservation-based introduction efforts.
Fitzpatrick, S.W., G.S. Bradburd, C.K. Kremer, P.E. Salerno, L.M. Angeloni, W.C. Funk (2020) Genomic and fitness consequences of genetic rescue in wild populations. Current Biology 30: 517-522.e5.