R and super-powerful PCs

R and Python are very popular lately, but have you asked yourself if you are getting the most out of these languages? In fact, you probably have a super‑powerful PC to work with them. Well, the truth is that if you are entering the world of Analytics, you might not be doing it optimally yet. Today I will show an example of the performance of different linear algebra libraries in R.

Linear algebra libraries are not packages you install; they are tunings applied directly to R’s files. When chosen correctly, they allow you to use all the processors on your PC; otherwise, you will only use one.

In this comparison, we will look at the following libraries for R:

  • blas: the default one that is single‑threaded.
  • openblas: the generic one for parallelization.
  • openblas-lapack: generalized openblas for more languages, e.g., Python.
  • intel-mkl: Intel’s optimized LAPACK for Intel processors.
  • open-r: the LAPACK included in Microsoft R Open.

The experiment:

We will run 3 tests for matrices of different sizes. Each test is performed 100 times for 10 sizes of square matrices ranging from 100 to 1000 columns. The median of each result is then plotted.

The test was run on an AWS c5.4xlarge instance with 16 latest‑generation processors.

Test 1: Inverting a matrix:

Clearly the default BLAS LAPACK lags behind; this is expected because it uses only one of the 16 CPUs. Below is the graph without BLAS.


Test 2: Squaring a matrix.

Again BLAS is far behind in time, so we include the graph without it.

Test 3: Principal Components:

This was a heavier test than the previous ones, making performance really relevant for large matrices.

And as usual, the graph without our friend BLAS.

Conclusions:

The difference between any optimized LAPACK and the default one that comes with R is enormous. But when you really want the best performance, the general recommendation is to go with Intel MKL or Microsoft R Open.

An interesting thing about Intel MKL is that it can also be integrated into Python libraries like Pandas, scikit‑learn, etc.

In the future I hope to do other comparisons to show how to get the most out of your PC, but with other methodologies.

Cheers!


Be the first to comment

Leave a Reply

Your email address will not be published.




This site uses Akismet to reduce spam. Learn how your comment data is processed.