Using multiple processor cores in R-Projet

Murat Yildizoglu

My laptop is equipped with a Core i7 processor with 4 cores that can execute in parallel 8 processes.

My R-project computations use only one single core, if I use the default instructions. I have ended up by thinking that it is a pity that other cores are just sitting idle (sort of), instead of contributing to the speed of my computations, even if I do not run yet really heavy ones in my research. As a consequence, I have started to look for an easy way to use all cores in R-project.

And, indeed, there is an easy solution to this problem. It uses the doMC library, and the instructions foreach and %dopar%. For example, for computing linear models with different dependent variables and a given set of exogenous ones, one can do the following computations:

library(doMC) # There are other parallel computing libraries
registerDoMC() # You must register one of them for foreach
getDoParWorkers() # Indicates you how many cores have been detected 
                  # by registerDoMC()

Suppose that you have a dataset called mydata, containing the dependent variables y1, y2, y3, and the independent variables, x1,x2,x3.

We can execute in parallel the estimation of linear models of each y on the set of independent variables, by executing the following code:

myVariableList <- c(\"y1\", \"y2\", \"y3\")  
results <- foreach(i = 1:length(myVariableList), .errorhandling=\"stop\", .inorder=TRUE)
%dopar% {
    model <- lm(as.formula(paste(myVariableList[i],\"~x1+x2+x3\")),data=mydata)
    return(model)
        }

%dopar% executes these estimations on different cores, in parallel and a list of the estimated models is saved in the variable results.

We can now look at the characteristics of the estimated models, by printing them successively on the output of R:

for (i in 1:length(results)) { print(summary(results[[i]])) }

VoilĂ !

Of course, this possibility is especially useful for more complex computations, like stepwise regressions with many independent variables, that can take some time, or regression trees with big datasets, etc.