Using GitHub with R and RStudio

github-logo

A few weeks back, The Molecular Ecologist released an article about GitHub and also created an organization where you can fork or simply download code shared by the Molecular Ecology community. A few of you out there may still be skeptical about the benefits of using GitHub. Or you may find it confusing and not worth the bother. You may be thinking to yourself (well, at least, I was guilty of this) that all of your code is backed up on Dropbox, Google Drive, and three external hardrives – so what could possibly go wrong? The short answer is: lots! The longer answer is that there really are some tremendous advantages associated with using Git and GitHub that may not be immediately apparent.

Git is a version control system and allows you to save copies of your code throughout the entire developmental process. Git isn’t the only version control system out there (e.g., SVN), but it is one of the more popular implementations. GitHub allows you to push your code from your local workspace to be hosted online. GitHub, which seamlessly integrates with Git, allows you to 1.) keep copies of all of your code through time, 2.) compare code from various points in time (very useful for debugging), 3.) collaborate with people on the same project in a non-chaos inducing fashion, and 4.) keep copies of your code both locally and online (note that you should still officially back up all of your work). Still not convinced? I suggest you google ‘why should I use version control?’

Below, I show how to use GitHub with Rstudio and also show that it is equally easy to use GitHub with any simple file of code. Thus, the take home message for the day is ‘GitHub is easy and you should use it.’

RStudio is an excellent integrated development environment built specifically for R. It also contains version control for Git and SVN. Below I outline the simple steps to get RStudio working with GitHub.

Set up a GitHub account here.

Download and install Rstudio.

Download and install the platform-specific version of Git (not GitHub), default options   work well.

Configure Git with global commands. I have found this step necessary both times I     ran through this process. Open up the bash version of Git and type the following:

git config --global user.name "your GitHub account name"
git config --global user.email "GitHubEmail@something.com"

Open Rstudio and set the path to Git executable. Go to Tools > Options > Git/SVN

It is important that you find your git.exe file (as shown above). This may be located in any number of places depending on your operating system, but the location of your GIT install is a good first place to look.

Restart RStudio and that is all there is to it! There are some simple guidelines at the RStudio website, which may be helpful. Now that you have successfully installed everything, lets run through a quick example. There are four terms associated with Git that you must learn: repository, commit, push, and pull. A repository equals the location and name for all the files associated with a particular project. The first step is to log into your GitHub account and create a new repository. Make sure you check the box ‘Initialize this repository with a README.’ When you are done, you should be able to view the Repository like below:

Screenshot 2013-11-12 09.36.42 - Copy

Notice the box highlighted in red. That box is really important – remember it as the ‘red box’. Now, open Rstudio and go to Project > Create Project > Version Control > Git and you should see a screen like below:

Screenshot 2013-11-12 09.37.08 - Copy

In the Repository URL box, you should copy and paste the URL indicated in the ‘red box’ above. This is how Rstudio knows what repository to use and associates it with your new project files. In this box you can also set the project directory.  Now do some work in your new R project and create and save some files. The next step is to ‘commit’ your work – essentially making a copy of all of your script files (i.e., .R files) associated with the R project. To do this go to Tools > Version Control > Commit.  This brings up the following window:

Untitled

Here you can see that I have saved two files, test1 and test2. Now I simply check the files that  I want to commit and press the ‘commit’ button, highlighted with the green box. If I want to also move these files onto the GitHub servers, I will click on the red box, marked ‘push’.  Look at your repository online to double check that your files actually made it there. That is pretty much all there is to it. You can also use the ‘git’ box in the top right-hand corner of Rstudio to make commits or use the various keyboard shortcuts. One feature that I think would be useful is for a commit to be made every time you save a file. I haven’t figured out how to do this, so please post a comment if you know how – or if you think that this would actually be a bad idea in practice.

Screenshot 2013-11-12 11.50.53

What if you decide that RStudio isn’t for you because you can’t live without Notepad++ or Sublime Text? No worries – GitHub is super easy to use on Mac or Windows (and, of course Linux, but you probably already knew that).  Simply download GitHub for Windows or GitHub for Mac. Follow the installation directions.  Create a few files and use the GUI to commit and push your files (see screenshot below) – it couldn’t be easier!

One advantage that I find to using RStudio is that everything is integrated, so it really takes no time at all to commit my R code and push it on to GitHub.  This extra convenience means that I make more frequent commits.  Remember that it is a good idea to commit and push often.  Well that’s about it.  Please feel free to contribute and pull from the Molecular Ecologist’s repositories – this resource will only get better as more people use it. Also, please add any tricks or tips to the comments below!

This entry was posted in bioinformatics, howto, R, software. Bookmark the permalink.