Absolutely Optimal

Program optimization is strange.

We naturally want our programs to run as quickly and efficiently as possible, but in some sense I have no idea what that actually means. Or, rather, I have no idea what “computation” actually entails.

The problem of optimization is at its simplest when you have a finite number of inputs to the program, and every input results in termination. In this case everything at runtime can be done in a constant amount of time. All runtime computation boils down to a bland lookup table (arguably you may be able to compute some values faster than you might look them up, but we’ll ignore such details).

This result is incredibly anticlimactic. It just means that all of the fanciful computation is pushed back to compile time. It doesn’t actually go away! Now instead of calculating what you need at runtime you calculate EVERYTHING at compile time. Perhaps this is fine. If you have a program that is compiled once, and then executed every nanosecond of every day on millions of computers then this may well be the best option, even for programs with relatively large state spaces.

Of course, that’s not always the case, and the problem seems most interesting at its core: compile time + execution time for a single instance of a given program. This is how much time it actually takes to compute your results, after all!

Yet this is more confusing still, because then what constitutes a program? Surely a programmer can just decide to write the entire damn thing as a lookup table, and then the compiler doesn’t have to do much work at all! But then it’s the programmer that’s doing the “real” computation. It seems we can just keep pushing the computation back into different layers, and it’s all horrendously meta!

Just what the hell are we doing? How do we describe a computation without doing computation? It seems so intimately linked, and I’m not sure that there is a way to distinguish between the two entirely. I think this is an important question.

So, what can we do? The one thing that comes to mind is that we might choose to describe algorithms with as little state as possible. The idea is simply that by minimizing the amount of information needed to describe an algorithm, we can limit the amount of a priori computation that is done. This can allow us to measure computation time much more reasonably, since we have a common starting point for every algorithm (although, I would wager that the minimal state algorithm description is actually not unique).

For instance consider the Fibonacci numbers. We might describe the algorithm for computing them something like:

\[F_n = \begin{cases} 0 & n = 0 \\ 1 & n = 1 \\ F_{n-1} + F_{n-2} & \text{otherwise} \end{cases}\]

Which has two integers worth of predetermined state. But some crafty programmer who wanted to shave a little bit of effort off of the compiler / running program might describe the algorithm as such:

\[F_n = \begin{cases} 0 & n = 0 \\ 1 & n = 1 \\ 1 & n = 2 \\ F_{n-1} + F_{n-2} & \text{otherwise} \end{cases}\]

Gasp! Now no computation aside from a lookup needs to be done for \(F_2\), since it was sneakily done beforehand. Now we have no idea how long this actually takes to compute! Of course, now this has more a priori state, which means it’s not in the “minimal” format.

Perhaps we could try to transform this into some kind of stateless representation, eliminating precomputation all together. Say we replace \(F_0\) with \(a\) and \(F_1\) with \(b\). That surely fixes things for us, right?

\[F_n(a,b) = \begin{cases} a & n = 0 \\ b & n = 1 \\ F_{n-1}(a,b) + F_{n-2}(a,b) & \text{otherwise} \end{cases}\]

So \(a\) and \(b\) are given at runtime and the program is not allowed to guess at what these values might be ahead of time. It can’t say, “oh, I’ll just precompute the first 100 values for when \(a = 0\) and \(b = 1\)”. This is forbidden! Now those crafty programmers can’t do any additional optimization!

But I’m not sure that this actually saves us.

\[F_n(a,b) = \begin{cases} a & n = 0 \\ b & n = 1 \\ a + b & n = 2 \\ a + 2 b & n = 3 \\ 2 a + 3 b & n = 4 \\ F_{n-1}(a,b) + F_{n-2}(a,b) & \text{otherwise} \end{cases}\]

We can still precompute and optimize the actual computation that’s going on. So we have to consider these cases for each \(n\) as part of the state in the algorithm description. Thus this actually takes more predetermined information. Again we may continue to make this program faster and faster by doing some of these steps beforehand ad infinitum, but this still seems like cheating to me. This is just sweeping computation under the carpet again. This is -funroll-loops to the extreme, and I find this unsatisfactory!

It feels like we’re missing a layer of abstraction here. Something is missing, and I’m not sure exactly what it is.