Using GitHub with R and RStudio

github-logo

A few weeks back, the Molecular Ecologist released an article about GitHub and also created an organization where you can fork or simply download code shared by the Molecular Ecology community. A few of you out there may still be skeptical about the benefits of using GitHub. Or you may find it confusing and not worth the bother. You may be thinking to yourself (well, at least, I was guilty of this) that all of your code is backed up on Dropbox, Google Drive, and three external hardrives – so what could possibly go wrong? The short answer is: lots! The longer answer is that there really are some tremendous advantages associated with using Git and GitHub that may not be immediately apparent.

Git is a version control system and allows you to save copies of your code throughout the entire developmental process. Git isn’t the only version control system out there (e.g., SVN), but it is one of the more popular implementations. GitHub allows you to push your code from your local workspace to be hosted online. GitHub, which seamlessly integrates with Git, allows you to 1.) keep copies of all of your code through time, 2.) compare code from various points in time (very useful for debugging), 3.) collaborate with people on the same project in a non-chaos inducing fashion, and 4.) keep copies of your code both locally and online (note that you should still officially back up all of your work). Still not convinced? I suggest you google ‘why should I use version control?’

Below, I show how to use GitHub with Rstudio and also show that it is equally easy to use GitHub with any simple file of code. Thus, the take home message for the day is ‘GitHub is easy and you should use it.’

RStudio is an excellent integrated development environment built specifically for R. It also contains version control for Git and SVN. Below I outline the simple steps to get RStudio working with GitHub.

  1. Setup a GitHub account here.
  2. Download and install Rstudio.
  3. Download and install the platform-specific version of Git (not GitHub), default options   work well.
  4. Configure Git with global commands. I have found this step necessary both times I     ran through this process. Open up the bash version of Git and type the following:         git config –global user.name “your GitHub account name”                                                     git config –global user.email “GitHubEmail@something.com”
  5. Open Rstudio and set the path to Git executable. Go to Tools > Options > Git/SVN                Screenshot 2013-11-12 09.53.56 - Copy

It is important that you find your git.exe file (as shown above). This may be located in any number of places depending on your operating system, but the location of your GIT install is a good first place to look.

Restart RStudio and that is all there is to it! There are some simple guidelines at the RStudio website, which may be helpful. Now that you have successfully installed everything, lets run through a quick example. There are four terms associated with Git that you must learn: repository, commit, push, and pull. A repository equals the location and name for all the files associated with a particular project. The first step is to log into your GitHub account and create a new repository. Make sure you check the box ‘Initialize this repository with a README.’ When you are done, you should be able to view the Repository like below:

Screenshot 2013-11-12 09.36.42 - Copy

Notice the box highlighted in red. That box is really important – remember it as the ‘red box’. Now, open Rstudio and go to Project > Create Project > Version Control > Git and you should see a screen like below:

Screenshot 2013-11-12 09.37.08 - Copy

In the Repository URL box, you should copy and paste the URL indicated in the ‘red box’ above. This is how Rstudio knows what repository to use and associates it with your new project files. In this box you can also set the project directory.  Now do some work in your new R project and create and save some files. The next step is to ‘commit’ your work – essentially making a copy of all of your script files (i.e., .R files) associated with the R project. To do this go to Tools > Version Control > Commit.  This brings up the following window:

Untitled

Here you can see that I have saved two files, test1 and test2. Now I simply check the files that  I want to commit and press the ‘commit’ button, highlighted with the green box. If I want to also move these files onto the GitHub servers, I will click on the red box, marked ‘push’.  Look at your repository online to double check that your files actually made it there. That is pretty much all there is to it. You can also use the ‘git’ box in the top right-hand corner of Rstudio to make commits or use the various keyboard shortcuts. One feature that I think would be useful is for a commit to be made every time you save a file. I haven’t figured out how to do this, so please post a comment if you know how – or if you think that this would actually be a bad idea in practice.

What if you decide that RStudio isn’t for you because you can’t live without Notepad++ or Sublime Text? No worries – GitHub is super easy to use on Mac or Windows (and, of course Linux, but you probably already knew that).  Simply download

GitHub for Windows or GitHub for Mac

Follow the installation directions.  Create a few files and use the GUI to commit and push your files (see screenshot below) – it couldn’t be easier!

Screenshot 2013-11-12 11.50.53

One advantage that I find to using RStudio is that everything is integrated, so it really takes no time at all to commit my R code and push it on to GitHub.  This extra convenience means that I make more frequent commits.  Remember that it is a good idea to commit and push often.  Well that’s about it.  Please feel free to contribute and pull from the Molecular Ecologist’s repositories – this resource will only get better as more people use it. Also, please add any tricks or tips to the comments below!

RedditDiggMendeleyPocketShare

About Mark Christie

Mark Christie is a post-doctoral fellow in the Department of Zoology at Oregon State University.
This entry was posted in bioinformatics, howto, R, software. Bookmark the permalink.
  • BoB

    There’s a Git plugin for SublimeText, too :) Having that integration is helping me remember to make frequent commits.

    I think auto-commits each time you save would be a bad idea: 1) you’d probably end up leaving minimal commit messages, which wouldn’t help you to track history, 2) I think ideally each commit should be a logical chunk of progress – I don’t know about you, but I save my file even when I’m halfway through figuring out how to solve my current problem or bug, so my history would get messy very quickly!

  • Mark Christie

    Thanks! The more I think about it, the more I agree that having an auto-commit each time I save would be a bad idea. One thing I definitely need to get better at is leaving sufficiently detailed, yet succinct, commit messages that make sense 6 months later. There kind of is an art to that step…

  • AliciaMastrettaYanes

    Check out this http://readwrite.com/2013/09/30/understanding-github-a-journey-for-beginners-part-1#awesm=~on5SNkoHs3E1wV to understand what exactly Github does. It is wrote for non programers and true git beginners.

  • Pingback: Using GitHub with R and RStudio | Mark R. Christie

  • Jessica

    Thanks for this great post! A great place to begin for scientists new to this whole world. Some of the keyboard shortcuts are outlined at the bottom of this page http://www.rstudio.com/ide/docs/using/keyboard_shortcuts

  • ABlekh

    UPDATE: I found my AWS instance’s external hostname and used it to add secondary e-mail to GitHub, but cannot access it from within the instance: mail says 0 messages (Postfix is running). My original question on discrepancy between e-mail addresses in git config still remains. Thank you!

  • ABlekh

    Hi, everyone! I’m a Ph.D. student (different field) and found this nice post. For my dissertation, I’ve setup RStudio on Amazon EC2 (free tier) and with it I use git for revision control in my GitHub repository. However, recently I discovered that my commits are not being reflected on GitHub, because RStudio pushes local (AWS) changes to GitHub as a Ubuntu user ‘R User’ that I’ve created for that purpose. Upon investigation, I found that my git config’s user.name and user.email correspond to my main name and e-mail address I use on GitHub. However, in commits’ log e-mail address is , where ‘ip-aa-bb-cc-dd’ is my AWS instance hostname. I would appreciate any advice on this. Best regards, Alex.

    UPDATE: I found my AWS instance’s external hostname and used it to add secondary e-mail to GitHub, but cannot access it from within the instance: mail says 0 messages (Postfix is running). My original question on discrepancy between e-mail addresses in git config still remains. Thank you!