Multi-Core Performance in R

Introduction

A few days ago, while walking around, I saw that they were selling a used HP Proliant DL360 G6. For those who don’t know, it’s a high‑performance server from 2010.

Due to my interest in Data Science and BIG DATA, this toy will be useful for a diploma course I’m interested in, which is taught by the FCFM of the University of Chile.

Toy Specifications:

2 x Xeon 5530 2.4 GHz, 4 processors with hyperthreading (disabled in this test)
64 GB RAM
Nvidia Quadro M2000 4GB VRAM GDDR5
Smart Array RAID card with battery
2 independent power supplies
4 SAS hard drives

Toy Photos:

The Test

I will use the plyr, data.table, and reshape2 libraries for aggregations, and doSNOW for parallelization. The dataset will be a table of about 30 million records similar to a mobile operator’s database (as far as I remember, these records are called CDRs stored in the CRCE of the rating platform).

Basically, the table will have a user identifier, a traffic type identifier, and the traffic amount.

Generate the Data

We will generate 30 million CDR records in a table that will have:

User identifier
Traffic type
Traffic amount

options(stringsAsFactors = FALSE)
library(reshape2)
library(plyr)

# Parameters
set.seed(1986) # seed set to the birth date of my dear girlfriend
tipo_trafico = c("DATA","SMS","VOICE")
prob_tipo_trafico = c(0.6,0.1,0.3)
userids = 5e4 # simulate n users
registros = 3e7 # number of CDR records

# Random traffic amount values
datos_tipo_trafico =
  data.frame(
    DATA = round(runif(registros,1,1024^2)),  # data session in bytes, between 1 byte and 1 MB
    SMS = rep(1,registros),                    # each SMS generates one record
    VOICE = round(rexp(registros,1/120))       # call length in seconds
  )

# Generate CDR records
cdr = data.frame(
  id_cdr = 1:registros,
  userid = floor(runif(registros,1,userids+1)),
  tipo_trafico = sample(tipo_trafico,registros,replace = T,prob = prob_tipo_trafico)
)

TI = proc.time()
cdr$trafico = datos_tipo_trafico$DATA*(cdr$tipo_trafico=="DATA") + 
              datos_tipo_trafico$SMS*(cdr$tipo_trafico=="SMS") + 
              datos_tipo_trafico$VOICE*(cdr$tipo_trafico=="VOICE")
print(proc.time()-TI)
rm(datos_tipo_trafico) # delete random data table
gc() # clean unused memory

The result is a table with this format:

	id_cdr	userid	tipo_trafico	trafico
1	1	5579	VOICE	20
2	2	28374	VOICE	82
3	3	28526	VOICE	36
4	4	39179	VOICE	56
5	5	14244	DATA	629075
6	6	36779	DATA	690397
7	7	42774	DATA	175632
8	8	4276	VOICE	115
9	9	4445	VOICE	44
10	10	29458	DATA	946171

Benchmark

We will create a table of aggregated traffic per user. This table will be generated with: reshape2, data.table, and plyr (single‑thread and multi‑thread).

Some comments before showing the results:

reshape2 allows aggregation but offers far fewer options than the other libraries.
data.table is an extension of data.frame that allows the use of indices.
plyr can run either single‑thread or in parallel.

The code used:

# using reshape2
TI = proc.time()
tmp = reshape2::dcast(cdr, userid ~ tipo_trafico, fun.aggregate = sum, value.var = "trafico")
print(proc.time()-TI)

# using plyr 1 thread
TI = proc.time()
tmp = ddply(cdr, "userid", function(x) data.frame(DATA = sum(ifelse(x$tipo_trafico == "DATA", x$trafico, 0)),
                                                  SMS = sum(ifelse(x$tipo_trafico == "SMS", x$trafico, 0)),
                                                  VOICE = sum(ifelse(x$tipo_trafico == "VOICE", x$trafico, 0))
                                                  ), .progress = T)
print(proc.time()-TI)

# using plyr 8 threads (one per CPU)
library("doSNOW")
nCPU = as.numeric(Sys.getenv("NUMBER_OF_PROCESSORS")[1])
cl = makeSOCKcluster(nCPU, outfile="cl.txt")
registerDoSNOW(cl)
TI = proc.time()
tmp = ddply(cdr, "userid", function(x) data.frame(DATA = sum(ifelse(x$tipo_trafico == "DATA", x$trafico, 0)),
                                                  SMS = sum(ifelse(x$tipo_trafico == "SMS", x$trafico, 0)),
                                                  VOICE = sum(ifelse(x$tipo_trafico == "VOICE", x$trafico, 0))
                                                  ), .parallel = T)
print(proc.time()-TI)
stopCluster(cl)

# using plyr 2 threads
library("doSNOW")
nCPU = as.numeric(Sys.getenv("NUMBER_OF_PROCESSORS")[1])
cl = makeSOCKcluster(2, outfile="cl.txt")
registerDoSNOW(cl)
TI = proc.time()
tmp = ddply(cdr, "userid", function(x) data.frame(DATA = sum(ifelse(x$tipo_trafico == "DATA", x$trafico, 0)),
                                                  SMS = sum(ifelse(x$tipo_trafico == "SMS", x$trafico, 0)),
                                                  VOICE = sum(ifelse(x$tipo_trafico == "VOICE", x$trafico, 0))
                                                  ), .parallel = T)
print(proc.time()-TI)
stopCluster(cl)

# using data.table
library(data.table)
cdr_dt = data.table(cdr, key = "userid")
TI = proc.time()
tmp = cdr_dt[, list(DATA = sum(ifelse(tipo_trafico == "DATA", trafico, 0)),
                    SMS = sum(ifelse(tipo_trafico == "SMS", trafico, 0)),
                    VOICE = sum(ifelse(tipo_trafico == "VOICE", trafico, 0))
                    ), by = "userid"]
print(proc.time()-TI)

Results

library	function	threads	seconds
plyr	ddply	1	9.18
plyr	ddply	2	78.37
plyr	ddply	8	146.04
reshape2	dcast	1	17.31
data.table		1	12.94

Conclusions

reshape2, despite being a library specialized in this type of transformation, turned out to be slower than plyr.

plyr lost performance as more threads were added. This is because for simple operations, the cost of parallelization is higher than not using the other CPUs.

data.table, despite using indices, turned out to be slower than plyr. When complex algorithms are used for simple situations, performance can sometimes be lost.