The lifetime of code

September 26, 2021

How far into the future should we be able to execute computer code written today? Next month, next year next decade or next century? My choice of a language depends on the answer to that very question. I recently had a very mixed experience with code and how long I can run it.

On one end, some Python+Pandas code that I wrote in 2018, following the best guidelines from the Pandas documentation, stopped working this year. For some reason, the designer of Pandas had decided that a particular way of sub-setting data frames that was state-of-the-art in 2018 is unacceptable in 2021. It came with the move from Python 3.7 to 3.9 and Pandas 0.25 to 1.0, here is the announcement. My code broke. And since I had forgotten all the gory details, it took me hours to fix. Consequence: never again will I use Pandas for a new project. Mea Culpa. As someone told me, its my fault for relying on the Python release cycle and not the Pandas one. Still, I prefer a language where breaking changes in key packages follow the main language release cycle.

Then, I tried the c code that I wrote for my PhD thesis and that compiled and ran perfectly, 30 years after being written.

At one point I gave up on c and moved to c++, that code will not compile today.

I just had to rerun a lot of R code to make figures for my lecture slides. You see, they used to be in 3:4 format and now I want them to be 16:9, so all the figures have to be made little bit wider. Some of the code was 10 years old, written for an ancient version of R, and it all ran without a problem.

And that takes me to Julia. I am very impressed with her. After my unpleasant experience with Pandas I decided to rewrite my project in Julia. Code is one third the length, and runs three times as fast.

But that leaves a question. If I start relying on this code for day-to-day calculations, will it be like Pandas and stop working in a couple of years. Or is it like R and work for at least 10 years or my 30 lifetime of c?

After I wrote this, I heard from a Julia developer.

All current code will run until Julia 2.0, many years in the future (with rare exceptions). Even then we will be able to use DataFrames 1.x instead of the future 2.x.

That is comforting, and unlike the Python+Pandas release cycle, Julia DataFrames seem to track Julia itself in terms of major releases and breaking code.

Or, as someone pointed out, just have the necessary versions in a Docker image.

But then, will those work in 10 or 25 years?

Ultimately, the lesson is the difference between a language whose core functionality is numerics, like R, Matlab and Julia vs. those languages where numerical calculations are grafted on, like Python. In the former group, breaking API changes in core numerical functionality happen with major releases, making it easy to monitor changes, while in the latter group it’s all much more precarious. A good reason to avoid Python for numerical work.

All my code survived the move from R 3.x to 4.x, while my Python code did not survive 3.7 to 3.9.