How to link and use Git with RStudio

If you work as a Data Scientist you need to know what Git is and work with it. If you don’t know what Git is and how you can link it with RStudio to use it in your projects, don’t worry, just continue reading.

What is Git and why you should use it

Git is an open-source version control service. Basically, it allows you to house your code and control the code that you upload, checking the possible conflicts and allowing you to go back to a previous copy of your code whenever you want. Git is free software and based on it there are several services, such as Github and Gitlab.

You know have a grasp of what Git is, but why should you use it? Surely you have ever encountered: many versions of the same code, difficulties to work as a team under the same files in a simple way, inability to recover code that you modified later, or problems when changing computers. If that is your case, surely that problem took you a long time to solve. Besides, we usually don’t have much time to spare, is not productive and, in addition, could have been avoided using Git.

On the other hand, Git in addition to enabling you to work more efficiently has more uses. In fact, when we put a Machine-Learning algorithm into production, we usually use continuous deployment. This is usually done using the CI/CD tools offered by Git-based tools such as Github and GitLab.

In short, if you work as a Data Scientist you need to use Git to make your job much easier. Luckily if you use RStudio it can be integrated with Git to make it even easier. So, let’s see how to integrate Git with RStudio!

How to connect Git with RStudio

There are different ways to connect Git services with Rstudio, but the most reliable and practical, in my opinion, is to do it using an SSH password.

This system consists of creating a key on your computer that we will enter in our Github / Gitlab account. In this way, we can connect to our Git service without having to use a username and password.

So, the steps that we will have to follow to connect Git with RStudio, either on Mac or Windows, are:

  • Create an SSH password on Mac or Windows.
  • Include our SSH password on Github or Gitlab.
  • Create a repo on Github or Gitlab.
  • Connect the repo to our RStudio.

As you can see, it is something very simple, so let’s go by parts.

1. How to create an SSH password

How to create an SSH password in Windows

To create an SSH key in Windows we first must have to check that the OpenSSH Client is installed. To do this, we must go to Configuration -> Applications and click on “Applications and Features”, as shown in the image.

How to create SSH on Windows - Configure Open SSH Client

There, we should look for OpenSSH Client. If it appears, perfect. If it doesn’t, you will simply have to add it by clicking the “Add a feature” button.

Once you have the OpenSSH Client installed, you must open the terminal and enter the command ssh-keygen. After doing so, it will ask us to enter a folder, although pressing enter will create the default folder.

After that, it will ask you to include a passphrase. Once again, if you press enter, no password will be included. However, I do not recommend doing this. After that, something like this will appear on the screen:

How to create SSH Key in Windows

This means that our SSH key has just been created. Now, to copy it, we simply have to go to the path where we have created it and open the file with a text editor such as Sublime Text. In order to do this, we first have to enable the visualization of hidden files. You can learn how to do it here.

Anyway, Another option to display the key is to enter the following command in the terminal:

clip < ~/.ssh/id_rsa.pub

With this, we will have our SSH key created and copied, so we can now include it in our SSH to link our Git service with RStudio.

How to create an SSH password on Mac

To create an SSH Keygen on Mac we simply have to open the terminal and write the following command:

 ssh-keygen -t rsa 

By doing this the computer will start creating the SSH Key. When it finishes, it will ask us to indicate the location where we want to save our keygen. If we want to save it in the default location, you simply have to press enter. As in Windows, where the SSH Key will be saved, we will be asked to create a passphrase. Once again, if you don’t want a password you just have to press enter, although this is not recommended. After doing so, the keygen will be generated, as shown in the following image:

How to create SSH key on Mac

Finally, to copy the Keygen to the clipboard, you must execute the following code:

 pbcopy <- ~ / .ssh / id_rsa.pub 

Done! You have already created your SSH keygen and copied it to the clipboard. Now let’s see how we can link RStudio with Github, for which we will need to include our SSH in our Git service. Let’s see how it’s done.

2. Include SSH password on Github or Gitlab

Now that we have created our SSH key, we must include it in Github or Gitlab so that the authentication can be done without us having to do anything. For both Github and Gitlab, the process is similar, although I will explain it separately.

How to include SSH Key in Github

To include the SSH Key in Github we must go to the “SSH and GPG Keys” section within the account settings. There we will find a button with which we can include an SSH Key, indicating a title and the Key, as it appears below:

How to add SSH key on Github

In the case of Github, unlike Gitlab, you cannot set an expiration date for the Key. However, if the SSH key has not been used for a period of one year, Github automatically removes it. In any case, in their Github documentation, they always recommend checking the Keys list SSH recurringly.

How to include SSH Key in Gitlab

In the case of Gitlab, we must go to the SSH Keys section within Preferences (in the drop-down of your profile). There we will have a window like the following:

How to add SSH key on Gitlab

So, we just have to include the SSH Key and give it a title to be able to identify it. Additionally, you can also include an expiration date for the SSH key. Although the latter is optional, it is recommended to do so.

3. Create a repository on Github / Gitlab

Now that we have included the SSH, key we have to create the repository where we will upload the files of our project. Creating a repository on Github is very simple, we just have to go to the “repositories” section and click on add. In Gitlab it is also very simple, although in this case it is not called a repository, but a project.

In both cases we will get a screen similar to this:

How to create a Github Repo

Also, both tools give the option to create a README.md file. Personally, I always recommend creating the repositories including this file, so that we define the goal of our project from its very beginning.

In any case, now that we have our repository created, let’s continue to see how to use Git in RStudio.

4. Connect our Git repository with RStudio

Once we have created our repository on Github or Gitlab, connecting it with RStudio is very simple. To do this, simply create a new project and choose “Version Control”, as shown in the image:

How to create an RStudio project linked to Git

When we do this, a new window will open where we will have to indicate the URL of the repository to which we want to connect. This URL must be the URL for SSH connection, which is different from the normal repository URL.

In the case of Github, the SSH of the repository is git@github.com:account_name/repository_name.git, while in the case of Gitlab the repository URL will be: git@gitlab.domain.com:group/repository.git.

How to connect your RStudio project to Git repo

By clicking on “Create Project” the project will copy what you have in Gitlab. In addition, you can make Commits, Push and Pulls within the same RStUdio. Let’s see how it works.

How to use Git in RStudio

Basically, when we work with Git in RStudio, we work on our computer, but we can upload or download Git data at any time or even restore previous versions. The main actions that we can perform after linking Git with RStudio are:

  • Commit: save the changes in the local repository.
  • Push: upload the data from the local repository to the remote repository. To make sure you include the latest changes, it is usually always preceded by a commit.
  • Pull: allows you to download the remote repository to the workbench.
  • Diff: allows you to check the differences between the staging index and our work environment.

All this can be seen in summary in the following image:

All these operations can be carried out both from the terminal and from the RStudio interface itself. Besides, the interface will now show a new button at the top bar. With his button you will be able to perform all the Git actions mentioned above from RStudio itself:

Git commands RStudio

Conclusions

Without a doubt, starting to use Git will be very useful and will save you a lot of headaches with version problems, teamwork, computer changes, etc. Also, being able to integrate Git into RStudio allows for a much easier workflow and makes using Git a breeze.

As always, I hope you liked the post. If so, subscribe to be aware of the rest of the posts. In any case, see you next time!