No I'm saying that a poor choice of data structure will have an amplified impact...

joakleaf · on Feb 9, 2017

If the algorithm has a running time of O(N^3), that implies that it takes O(N^3) < c * N^3 operations to complete. Regardless of datastructures. That's the definition of asymptotic running time.

If we disregard c, and set N = 100, that is 1,000,000 operations.

We have two situations. 1) Either we use a good layout where each operation is fast because each memory access is from cache. 2) Or we use a bad layout where each operation is slow because we have a cache-miss and have to access the RAM. These access times are independent of the specific algorithm, and only depends on CPU architecture.

Let say that the fast operations each take 1 second. And the slow operations each take f seconds.

In the fast case, with good memory layout, the algorithm completes in 1,000,000 seconds. In the slow case, with bad memory layout, the algorithm completes in f * 1,000,000 seconds.

The time of each operation we perform does not depend on the size of N. However, in practice, you'll typically have a lot of cache hits even with a bad memory layout, so the difference may be less than a factor of f for smaller values of N (where you'll have more cache hits in a cubic algorithm). However, it will be upper bounded by f. I.e. there is an upper bound on how much slower you can make an algorithm by changing the memory layout.

pacaro · on Feb 10, 2017

Obviously N and f are independent (although they might not be, your N may dictate how often a cache miss occurs).

Right, so now let's assume that our O(N^3) algorithm is simply a triple nested loop. And that we have an expensive operation f, but that through choice of data structures we can choose in which layer that operation occurs. All other costs are constant.

If f is in the outer loop, it happens N times, in the middle loop N^2, in the inner loop N^3.

So our choice of data structure impacts our running times by any factor from linear through cube of N.

I was simply answering the question of is it really possible to have two implementations of the same algorithm have such disparate run times, and it's trivially possible.

Another way of saying it, is that the mistake need not look expensive on cursory inspection, a simple way to have very different runtimes is through the accumulation of a large number of small costs.

Maybe the important point is to use a profiler.

I've recently gained a 60x improvement in real-world run time of what is considered in the literature to be the best known algorithm for a particular problem. The speed up didn't come from improving on the algorithm, it came from improving data locality, removing memory allocations, and general modern code hygiene (to be fair the canonical implementation is ~20 years old)