Update, 20 August 2013: Many readers have requested a copy of the Joshua tree data set used as an example in this post, and I’ve finally secured permission from the coauthors of the original study to post it to Dryad. You can download it right here. Happy coding!
One of our most consistently popular posts of the past few months has been Kim Gilbert’s introduction to using geographic data to make maps in R. But once you’ve made a nice map of your collection sites, you’ve only just started to tap the possibilities of spatial data in R. With a suite of packages anchored by dismo, you can use R and open-sourced climate data to determine the environmental conditions your favorite species requires—by building a species distribution model.
Species distribution models (SDMs) are handy any time you want to extrapolate where a species might be based on where you know it actually is. Maybe you’re trying to figure out where would be fruitful to do more sampling; maybe you want to know where your favorite critters probably lived back during the last ice age; maybe you want to know what regions will be suitable for your favorite critters after another century of global climate change. The basic idea is,
- Take a list of locations where you know you can find a species and identify the climate (or other environmental conditions) at those locations;
- Build a statistical model (using one or more of several available methods) that differentiates the climate (or other conditions) at the points where your species is found from other points where your species isn’t found; and
- Apply the model to climate (or other conditions) from some other time or place to estimate a probability that your species would be happy there.
This post provides a little demonstration of what you can do given a reasonably good set of collection sites and a no-longer-cutting-edge laptop. But caveat lector: actually building SDMs for publication-grade analysis requires a lot more work that what I’m presenting here. If you like the possibilities, you should start by reading the “vingnette” documents “Species distribution modeling in R” [pdf] (by dismo developers Roger Hijmans and Jane Elith) and “Boosted regression trees for ecological modeling” [PDF] (by Elith and John Leathwick). Those provide lots of detail and further reading lists, including a relatively recent review by Hijmans and Elith. They’re also the starting point for the demonstrations below.
First, you’ll want some distribution data. For this demonstration, I’ll use coordinates from a 2009 SDM study of Joshua trees by Godsoe et al. (fullest possible disclosure: I’m part of et al.). These are more than 5,000 locations where Joshua trees have been spotted—which is relatively easy, since they’re the tallest things that grow in most of the Mojave Desert—compiled from public databases, then ground-truthed and supplemented by the coauthors. All of those points are plotted in the map up at the beginning of the post; here’s the code that will set that up:
Continue reading →