Of Julia and R
May 3, 2020
I just tried the same code in R's data.tables and Julia's DataFrames, and the results are a bit surprising.
I just did a quick analysis of volatility and returns, starting with my usual program — R.
It involves a large data set, CRSP, (compressed 979M) kept in a data.table. Thinking about using Julia more, I thought this would be a good experiment.
I initially used daily data, and because the sample has 33,617,369 rows and six columns, it is quite representative of the work I do.
I've made comparisons with Julia before: Is Julia ready for prime time? and Which numerical computing language is best: Julia, MATLAB, Python or R? ,
In any case, I used R version 3.6.0 and Julia 1.4.1. In case anybody complains, yes, this is not the latest version of R, but it's such a pain to upgrade it on all my systems that I never managed to do it. Besides, it should make any difference here.
Anyways, here are the 2 main calls in the two languages:
data[,list(length(RET),mean(RET),sd(RET)), keyby = list(year,PERMNO)] by(data, [:year,:PERMNO]) do data DataFrame(m = mean(data.RET),s = std(data.RET),c = length(data.RET)) end
The R code looks more readable, unusual, as Julia's code is typically much better looking.
One core only, and Julia took 2.8 seconds, R 3.1 seconds. That did surprise me. Data tables is supposed to be quite fast. We have a benchmark site that regularly compares such data operations, finding that R's data.table is several times faster than Julia's data table in most cases. I can't explain why, but I did only only use 1 core.
But, if I take the total time, including loading the data in, timed by:
time Rscript run.r time julia run.jl
R took 11.7 seconds and Julia 29.7 seconds. The reason is, of course, it takes forever to load Julia packages.
When I google such timing results, the answer is usually it doesn't matter because one starts Julia once. Once in the REPL, everything is fast.
Fair enough, except I have quite a bit of code that only runs in command line calls only.
Besides, waiting a third of a minute before the program has loaded is quite a pain.
In a good thing I didn't try to plot in Julia. Not only does using Plot take a long time, plot() reliably crashed on me, so bad I had to do killall julia and then also kill the plot window.
But, that is not the reason I chose R for my blog Low vol strategies. No, it is merely because I used RMarkdown, which is fantastic for that sort of work. (and I use RMarkdown but not Rstudio.)
I am porting my main risk library from R to Julia, so may end up using her for regular work, like updating extremerisk.org daily, especially if the startup times improve.
Models and riskBloggs and appendices on risk, models, regulations, cryptocurrencies and related topics
© All rights reserved, Jon Danielsson,