CS people only. Seriously. This is about dynamic programming.: pmb

pmb

CS people only. Seriously. This is about dynamic programming.

Jan 14, 2009 17:34

So conform has been going through the Streamtech problem set and doing them one by one. We got in a conversation about dynamic programming and memoization, and he pointed out http://www.streamtech.nl/problemset/108.html which has that in spades. In particular, with input sizes of up to 100*100, the naive algorithm is going to run in time O(n^8)ish, and finding the answer will take forever.

So I coded up a solution to the problem ( http://pastie.org/360900 ) using my memoize decorator from long ago. I created a 100*100 test case and the program proceeded to run forever and use all my memory. Clearly something was up. So I created a test-case creator ( http://pastie.org/360940 ) and tested it on problems of size 5,7,9,11,...,50 because 50 was the largest test case that finished in a reasonable amount of time.

Putting this into gnuplot and fitting the curve to f(x) = a*x**b, gnuplot found that my program was running in time O(n^5.5). This is too darn slow.

So I made a new solution ( http://pastie.org/360914 ) which doesn't use my cheater memoize decorator, but actually builds a 4 dimensional array and fills it in. This should definitely run in time O(n^4), but after testing the other, I wanted to test this one the same way. So I ran it on all the same size inputs, and then took the data and fitted it to the same curve in gnuplot, and the exponent parameter was, in the fitted line, almost exactly 4. Hooray!

Then I ran it on a 100*100 example, and it finished in 5 minutes and 30 seconds. Not great, but not bad for python. So that's that. A cute problem that can be resolved into a fast(ish) solution using dynamic programming/memoization, and a potent reminder to me that hashtables are not always O(1). I think the neatest thing about the graph is that you can actually see the hastables starting to fail - the runtime of the programs match each other exactly up through n=20, and then begin to diverge as collisions become more and more prevalent. I wonder if this is a good test of a hashtable, or if this is a useless corner case. Because I really like my memoize decorator and hate having to continually wonder about whether or not it's working right.

Update: conform found a faster and sexier way using an algorithms from Jon Bentley's 'Programming Pearls'. At first I couldn't believe that it worked, because it was totally subtle, but now I buy it. See how to do this right at: http://conform.livejournal.com/36375.htm