Central bank digital currencies
13 January 2021

Central-bank digital currencies will make the financial system more efficient. But will they make it safer? Maybe, but they could easily end up increasing systemic risk. Depends on the implementation.

Erasmus and Turing
11 January 2021

As a part of its quest for independence, Britain decided to leave the Erasmus program. We now have Turing, the global version. So, will it lead us to the sunny uplands?

Brexit and Marxism
3 January 2021

The Brexiteers and Karl Marx have more in common than often thought. Both were guided by ideology, and both refused to tell us how to get to the sunny uplands they promised.

What to do about the Covid financial system bailouts?
22 December 2020

Financial crises and bailouts of financial institutions are inevitable and can’t be prevented without paying too high a price. Diversity is the best way to minimise the frequency and severity of crises and ensure sustainable high economic growth.

The crypto-technical response to the Covid-19 bailouts
13 December 2020

Are cryptocurrencies and blockchains the solution to the problem laid bare by the Covid-19 bailout of the financial system? No.

The libertarian response to the Covid-19 financial turmoil
9 December 2020

A libertarian sees the Covid-19 bailout of the financial system as a predictable failure of regulations. Much better to have a laissez-faire economy and never, ever bail private firms out. But does the laissez-faire utopia survive contact with reality?

The socialist response to the Covid-19 financial turmoil
4 December 2020

The state just saved the financial system from itself. What is the point of privately owned banks if they need to be bailed out every decade?

On the response of the financial authorities to Covid-19
2 December 2020

The financial authorities have just bailed the financial system out for the second time in a decade. While the authorities are proud of having prevented a financial crisis, are we really better off? No, one cannot judge the policy intervention a success if it is only due to the promise of the intervention that the problem arose in the first place.

The Covid-19 bailouts and the future of the capitalist banking system
26 November 2020

The central banks bailed out the financial system in March 2020, the second time they have done so in 12 years. What is the point of privately owned banks if they require a bailout every decade?

Which programming language is best for economic research: Julia, Matlab, Python or R?
20 August 2020

The most widely used programming languages for economic research are Julia, Matlab, Python and R. This column uses three criteria to compare the languages: the power of available libraries, the speed and possibilities when handling large datasets, and the speed and ease-of-use for a computationally intensive task. While R is still a good choice, Julia is the language the authors now tend to pick for new projects and generally recommend.

ARM on AWS for R
15 June 2020

Amazon AWS has been recommending ARM Graviton2 as a cost-effective alternative to Intel/AMD instances. So I tried it out.

Low vol strategies
8 May 2020

Does it make sense to invest in low vol funds?

Of Julia and R
8 May 2020

I just tried the same code in R’s data.tables and Julia’s DataFrames, and the results are a bit surprising.

How to manipulate risk forecasts 101
30 April 2020

It is easy to manipulate risk forecasts. If your regulator or compliance officer sets a risk target you don’t like, just tell them what they want to hear and continue taking the risk you like.

The five principles of correct riskometer use
27 April 2020

It is easy to criticise risk forecasting, but it’s rather pointless unless one can come up with proposals. Here are my five principles for the correct use of riskometers.

The problem with Backtesting
25 April 2020

Backtesting is the magical technique that tells us how well a forecast model works. Test the model on history, and we have an objective way to evaluate how good the model is. But does it really work in practice?

More

Web appendix for numerical language comparison 2020

Alvaro Aguirre
Jon Danielsson

20 August 2020
Updated 26 August 2020

We compared four numerical languages, Julia, R, Python and Matlab, in a Vox blog post.

The Execution Speed section of the blog post includes three experiments to evaluate the speed of the four languages. The versions of the languages used are Julia 1.5.0, R 4.0.2, Python 3.7.6 and Matlab R2020a. All code ran on a late model MacBook Pro, Intel i9 with 64 Gb RAM.

  1. GARCH log-likelihood
  2. Loading large dataset
  3. Calculating annual mean and sd by year by PERMNO
  4. Julia improvements
  5. Python matrix multiplication
  6. Conclusions

Experiment 1: GARCH log-likelihood

The first experiment involved comparing the running times for the calculation of GARCH log-likelihood. This is an iterative and dynamic process that captures a large class of numerical problems encountered in practice. Because the values at any given time t depend on values at t-1, this loop is not vectorisable. The GARCH(1,1) specification is

\[\sigma_t^2= \omega + \alpha y^2_{t-1}+ \beta \sigma_{t-1}^2\]

with its log-likelihood given by

\[\ell=-\frac{T-1}{2} \log(2\pi)-\frac{1}{2}\sum_{t=2}^T \left[ \log \left(\omega+\alpha y_{t-1}^2+\beta\sigma_{t-1}^2\right)+\frac{y_t^2}{\omega+\alpha y_{t-1}^2+\beta\sigma_{t-1}^2} \right]\]

For the purposes of our comparison, we are only concerned with the iterative calculation involved for the second term. Hence, the constant term on the left is ignored.

We coded up the log-likelihood calculation in the four languages and in C, keeping the code as similar as possible between languages. We included the just-in-time compilers Numba for Python, and the C++ integration Rcpp for R. We then simulated a sample of size 10,000, loaded that in, and timed the calculation of the log-likelihood 100 times (wall time), picking the lowest in each case.

The code can be downloaded here.

C

The fastest calculation is likely to be in C (or FORTRAN)

	 
gcc -march=native -ffast-math -Ofast run.c
double likelihood(double o, double a, double b, double h, double *y2, int N){	
    double lik=0;
	for (int j=1;j<N;j++){
		h = o+a*y2[j-1]+b*h;	
		lik += log(h)+y2[j]/h;
	}
    return(lik);   
}

R

R can be used independently, which is likely to be slow

	 
likelihood =function(o,a,b,y2,h,N){
	lik=0
	for(i in 2:N){
		h = o + a * y2[i-1]+ b * h
		lik = lik + log(h) + y2[i]/h
	}
	return(lik)
}

but using Rcpp will make it much faster by integrating R with C++ and compiling the likelihood function

	 
require(Rcpp) 
a = "double likelihood(double o, double a, double b, double h, NumericVector y2, int N){	
    double lik=0;
	for (int j=1;j<N;j++){
		h = o+a*y2[j-1]+b*h;	
		lik += log(h)+y2[j]/h;
	}
    return(lik);   
}"

cppFunction(a)

Python

Python is likely to be quite slow,

	 
def likelihood(hh,o,a,b,y2,N):
    lik = 0.0
    h = hh
    for i in range(1,N):
        h=o+a*y2[i-1]+b*h
        lik += np.log(h) + y2[i]/h
    return(lik)

but will be significantly sped up by using Numba.

	 
from numba import jit
@jit
def likelihood(hh,o,a,b,y2,N):
    lik = 0.0
    h = hh
    for i in range(1,N):
        h=o+a*y2[i-1]+b*h
        lik += np.log(h) + y2[i]/h
    return(lik)

MATLAB

	 
function lik = likelihood(o,a,b,h,y2,N)
    lik=0;
    for i = 2:N
        h=o+a*y2(i-1)+b*h;
        lik=lik+log(h)+y2(i)/h;
    end
end

Julia

Julia was designed for speed so we expected it to be fast:

	 
function likelihood(o, a, b, h, y2, N)
    local lik = 0.0
    for i in 2:N
        h = o+a*y2[i-1]+b*h
        lik += log(h)+y2[i]/h
    end
    return(lik)
end

Results

All runtime results are presented relative to the fastest (C).

Timing

If we only look at the base version of the four languages, Julia comes out at the top. This is unsurprising since it is a compiled language while the other three are interpreted. The speed of base Python for this type of calculation is very slow, being almost three hundred times slower than our fastest alternative, C.

However, when we speed up R and Python using Rcpp and Numba respectively, their performance increase significantly, beating Julia. In the case of Rcpp this involved compiling a C function in R, whereas in Numba we only have to import the package and use the JIT compiler without modifying our Python code, which is convenient.

Experiment 2: Loading a large database

Our second experiment consisted in loading a very large CSV data set, CRSP. This file is almost 8GB uncompressed, and over 1GB compressed in gzip format. The code for this experiment and the next one can be downloaded here.

R

R can read both compressed and uncompressed file using the fread function from the data.table package:

	 
require(data.table)

uncompressed <- fread("crsp_daily.csv")

compressed <- fread("crsp_daily.gz")

Python

Python’s pandas is a convenient option to easily import csv files into DataFrames. We used it to read the uncompressed file, and in combination with the gzip package to read the compressed file:

	 
import pandas as pd
import gzip

uncompressed = pd.read_csv("crsp_daily.csv")

f = gzip.open("crsp_daily.gz")
compressed = pd.read_csv(f)

MATLAB

To date, MATLAB can only read uncompressed files:

	 
uncompressed = readtable('crsp_daily.csv');

Julia

We used Julia’s CSV and GZip packages to read both type of files:

	 
using CSV
using GZip

uncompressed() = CSV.read("crsp_daily.csv");

compressed() = GZip.open("crsp_daily.gz","r") do io
                CSV.read(io)
               end

Results

All runtime results are presented relative to the fastest, which was R’s uncompressed reading time. All code ran on a late model MacBook Pro.

Timing

R’s fread function was the fastest to load both the uncompressed and compressed file. In all applicable cases, reading the uncompressed file was faster than reading the compressed file, although this difference was much more significant in the case of Julia. Matlab came out at the bottom, with an uncompressed loading time over five times slower than R, and the inability to load the compressed file.

Experiment 3: Calculating annual mean and standard deviation by year and firm

Our last experiment involved performing group calculations on the large dataset imported in experiment 2. We wanted to evaluate the speed in computing the annual mean and standard deviation by year and firm.

R

Using R’s data.table package:

	 
require(data.table)

R <- data[,list(length(RET), mean(RET), sd(RET)), keyby = list(y, PERMNO)]

Python

Python’s pandas allows groupby operations:

	 
import pandas as pd

R = data.groupby(['PERMNO', 'year'])['RET'].agg(['mean', 'std', 'count'])

MATLAB

MATLAB can perform group by operations on table objects using the grpstats function from the Statistics and Machine Learning Toolbox:

	 
import pandas as pd

statarray = grpstats(data, {'PERMNO', 'year'}, {'mean', 'std'}, 'DataVars', 'RET');

Julia

Julia’s DataFrames package allows for this type of computations:

	 
using Statistics
using DataFrames

R = by(data, [:year, :PERMNO]) do data
    DataFrame(m = mean(data.RET), s = std(data.RET), c = length(data.RET))
end

Results

All runtime results are presented relative to the fastest (Julia).

Timing

Julia proved its superiority in speed due to being a compiled language and was the fastest of the four. It was closely followed by R and Python. Matlab was by far the worst.

Julia improvements

After the article was published, Bogumił Kamiński, developer of Julia’s DataFrames package, suggested we take a different approach for the Julia calculation by using combine() instead. Doing so necessitated fixing a bug, implemented in the v0.21.7 release of DataFrames on August 25th, 2020.

We were able to do the calculation with:

	 
using Statistics
using DataFrames

R = combine(groupby(data, [:year, :PERMNO]),
   :RET => mean => :m, :RET => std => :s, nrow => :c)

The results of this was a decrease in processing time to almost half, even when Julia was already the fastest language for this task, as shown in the plot below.

Timing

Python matrix multiplication

In the article we mention that if you want to multiply two matrices, X and Y, using Python, you would have to do:

	 
import numpy as np
np.matmul(X,Y)

From Python 3.5 onwards, this can also be done via the operator @:

	 
import numpy as np
X @ Y

Conclusion

When comparing the four languages without compiler packages like Numba or Rcpp, Julia proved itself superior in computations, like the GARCH log-likelihood and the group by statistics, while R’s fread function was unparalleled in loading large files.

When we introduce compiler packages, the picture changes.

Python’s Numba package allows for efficient just-in-time compiling with minimal additional code — implementing Numba led to the calculation here being over 200 times faster. However, Numba can only be used in relatively simple cases and can not be considered a good general solution to the slow numerical speed of Python. We could have used Cython, but it is much more involved than Numba.

By coding the likelihood function in c and using R’s Rcpp, we obtained the fastest speed in calculating GARCH log-likelihood relative to C. We could, of course, do the same for the other three languages.

© All rights reserved, Jon Danielsson, 2021