R and Python commingled: RPyStats in Windows and Linux. Season 1, Episode 6

Alfonso R. Reyes
(18 July 2019)

Online


Alright! We have two Rmarkdown notebooks written in Python and R running PyTorch libraries. There are many more things that still we can do to improve the accuracy of the model to recognize hand-written digits. But before we continue improving the algorithm and the model, I wanted to make a brief pause and showed you something that really made me jump ship from Windows to Linux. It is related to R, Python and data science.

I have been doing data science and machine learning most of the time in Windows, using virtual machines with Linux to test R and Python packages to verify they work in most of the operating systems. That includes Unix in macOS. But I will focus this episode on Linux.

RSuite is a multi-platform software; runs under Windows, Unix, Linux and macOS. To me RSuite is a bless because not only makes easier to run Python from R but allows me to add an extra layer on top of projects and packages; a layer that allows me to organize, run, clean up, transmit, receive scripts and packages. Call it, if you want, a supervising or orchestrating master project.

Virtual Machines are fine but when the job requires CPUs, GPUs and gigabytes of memory, the VM start getting laggy. So, I thought, “let’s try now this amazing rsuite paradigm in Linux. See how it goes.” I had a HP Zbook G2 laptop that I could not sell and put it as an Windows office server - had the license anyway. So, I installed Ubuntu 18.04 and off it went. Running Linux is tremendously satisfying because it feel like sailing with the wind in your favor. But that’s just me. I don’t run Windows specific complex software anyway.

I tested rsuite in Linux, cloned a RPyStats project I developed in Windows. It worked flawlessly!

Data Science should work anywhere, regardless of the operating system.

Here is what I will share with you:

  • Cloning from GitHub a RPyStats project developed in Windows into a Linux virtual machine or physical machine
  • Install the R dependencies of the project in Linux
  • Install the Python dependencies
  • Run the notebooks

We will do all this in a Linux Ubuntu 18.04 machine.

Cloning the repository *rpystats-apollo11*:

git clone https://github.com/f0nzie/rpystats-apollo11.git

No alt text provided for this image

Change (cd) to *rpystats-apollo11* folder and install R dependencies:

cd rpystats-apollo11

rsuite proj depsinst

No alt text provided for this image

Installing the Python dependencies:

rsuite sysreqs install

No alt text provided for this image

No alt text provided for this image

Go to master folder and run the RStudio project:

No alt text provided for this image

In RStudio navigate to the folder ./work/notebooks and open the Rmarkdown notebook *mnist_dgits_rstats.Rmd*:

No alt text provided for this image

No alt text provided for this image

Knit the notebook to HTML:

No alt text provided for this image

No alt text provided for this image

The notebook will start building (or kniting). You can follow the progress with the percentage numbers in the “R Markdown” pane.

No alt text provided for this image

It will take about 2 to 3 minutes because it is training the model.

No alt text provided for this image

Then you get your HTML file in the RStudio browser:

No alt text provided for this image

Done!

You just build a RPyStats project using rsuite and RStudio in Linux!

Example repository

[https://github.com/f0nzie/rpystats-apollo11](http://Example repository https://github.com/f0nzie/rpystats-apollo11)