MoleculaR analyses with R:

R is a powerful data analysis environment that has a large number of useful features.  Chief among them are: (1) it is open source and freely distributed, meaning you can download and install it on any computer you have access to, (2) it runs on Windows, Macs, and Linux machines, (3) it is continually updated, monitored, and improved upon by a dedicated team of computer scientists, statisticians, and researchers, (4) the underlying algorithms in the written functions are quite adaptable, somewhat forgiving, and remarkably efficient, (5) the syntax of the language is very intuitive (newbies may scoff at this, so you may have to take my word on this) and (6) there are a vast array of user-contributed packages that allow for many specialized forms of data analysis (two random examples: multivariate analyses [ade4], map drawing [maptools]).

One attribute of R that unfortunately makes people choose other platforms is that it is largely a command line interface.  This means that there is very little pointing and clicking involved.  It also means that there is some syntax that has to be learned, which can intimidate those new to coding.  I will admit that there certainly is a learning period involved, but the advantages are well worth the effort.  With the command-line interface you can perform nearly any analysis that you can think up.  As a molecular ecologist this is an invaluable skill as genetic data often must be analyzed in novel or non-traditional ways.  My best advice is simply to dive in!  Immersing yourself in the language (i.e., forcing yourself to do all analyses in R) for a short period of time is the quickest way to learn.  Why not be productive at the same time?  Try to run an ANOVA or t-test on some data you have been waiting to analyze.

In the next few posts I will provide some simple walkthroughs to (1) import data and calculate allele frequencies from genotype data, (2) calculate Weir and Cockerham’s unbiased Fst, (3) look at some multivariate packages for population genetics and genomics and (4) explore ways to create publication-quality figures.  Below are some vital links for getting started with R.  Keep in mind that there is a very large community of R users and much help can be found online with a search engine.   Please feel free to recommend other resources and/or tips in the comments:

The main R webpage:  http://www.r-project.org/

What R is all about:  http://www.r-project.org/about.html

The R card: very useful for learning syntax.

R introductory manuals.

R commander: a point and click graphical user interface for R, which simultaneously displays the code in R – great for users intimidated by the command line.

TINN-R : A essential text editor for Windows users**.

RSeek: A search field that only returns results relevant to R.

QuickR:  Useful walkthroughs for some basic functions/calculations.

The R Journal: useful tidbits for more advanced users.

**(Using a text editor that interfaces with R is a great idea.  That way you can save, easily manipulate, and send code to R.  If you don’t save your code, you will forget it.  It is also a great way to document all of your analytical steps in a study such that you can go back and easily repeat what you did years from now).

Share

About Mark Christie

Mark Christie is an assistant professor in the Department of Biological Sciences and Department of Forestry & Natural Resources at Purdue University.
This entry was posted in howto, software. Bookmark the permalink.