Comments | conform: 108: bring out your hate

conform

108: bring out your hate

Jan 14, 2009 16:01

Back to all threads

pmb January 15 2009, 00:29:27 UTC

The only intuition I have is to write your program in the most recursive-ish side-effect-free style you can, and then to check whether you ever call your function twice with the same arguments. If you ever do, then dynamic programming is the solution, and memoization is the easy way to get there.

conform January 15 2009, 18:32:36 UTC

So my solution is significantly faster than yours (O(n^3) vs O(n^4)), but it was throwing me off that your outside recursion is no improvement over enumeration. In fact, it's slower than a straightforward enumeration, since you have to build a bunch of call stacks, etc.

I think I just realized that the important recursion that makes your approach "dynamic" is the one in your total() function (essentially the same as the one I produced in grid_sum()). This makes me feel much better about the world and my understanding of it.

pmb January 16 2009, 15:25:30 UTC

Yours is definitely O(n^4) at least.

max_rect_sum calls max_column_sum n*n times, and max_column_sum is O(n^2) with its two nested loops each of extent n.

Depending on how you build up your solution, both memoizations may be important, or neither. The way I built mine, both are important. The way coldtortuga built his, neither are.

conform January 16 2009, 17:44:41 UTC

max_rect_column_sum is O(n), having only one loop. and my approach is definitely faster -- try running my code with D=60 or D=100 and compare to yours.

i'm also not sure about your assertion that the outer memoization is significant. if i replace your outer loop with an enumerative one, i get results like the following (in order: my code, your original, your enumerative):

sea-smoke:~/problems conform$ cat matrix_60 | time python 108_max_rect.py
2781
1.42 real 1.36 user 0.04 sys
sea-smoke:~/problems conform$ cat matrix_60 | time python peter.py
2781
59.62 real 58.88 user 0.61 sys
sea-smoke:~/problems conform$ cat matrix_60 | time python peter_enumerative.py 2781
27.58 real 27.13 user 0.32 sys

pmb January 16 2009, 18:21:10 UTC

I can't argue with that timing data, but I will continue to quibble with your assertion that your solution is both correct and O(n^3) because you need to nigh-on-fully-populate a 4-D array inside of grid_sum.

If it doesn't do so, then there exist cases where your program and mine will return different values, and I'm pretty sure mine will be right and yours will be wrong.

conform January 16 2009, 19:05:00 UTC

I can't tell how much you're actually looking at my code, and how much you're applying Occam's Razor to your CS knowledge vs. mine. :)

My approach (which consists of taking someone else's algorithm for a linear-time 1D search and extending it to a 2D problem) doesn't fully populate the search space. It considers every vertical slice of the input array (roughly n^2/2 slices), and for each one, performs a linear search on it to find its maximal subrect.

The simplest way for you to convince me that I'm wrong here is to find a flaw in or counterexample for the linear 1D algorithm:

def max_sequence(array):
max_sum = -float('infinity')
current_sum = 0
start = 0
for end in range(array):
current_sum += array[end]
if current_sum > max_sum:
max_sum = current_sum
if current_sum < 0:
current_sum = 0
start = end + 1
return max_sum

pmb January 16 2009, 22:16:47 UTC

The problem, I'm pretty sure is not with the 1-d algorithm, but with your extension to 2 dimensions. I'm working on figuring it out right now.

pmb January 16 2009, 23:42:07 UTC

This is weird. Your algorithm runs in time O(n^3.5) (measured) and I haven't been able to find a single case where yours and mine disagree (and I wrote a shell script and tried hundreds).

Tuning it as suggested by Dan at
http://pmb.livejournal.com/79825.html?thread=757201#t757201 and retesting right now...

Weird. It now runs in time O(n^3.3) (measured) Oh those wacky hash collisions.

After finally understanding your 1-D algorithm I now buy your 2-D algorithm. It really is pretty clever! Certainly cleverer than mine. Because the answer to my unbelieving query is: NO. No you do NOT have to fully populate a 4-D array, and you can determine the answer in the 1-D case in O(n) time, and not O(n^2) time.

Dammit. Now I feel like a doofus. You were right all along! Your approach is faster and better and remains correct.

conform January 17 2009, 00:01:52 UTC

Thanks. I can have batchelor's degree?

My suspicion is that the extra fractional exponent can be eliminated by tuning the cache generation -- it should be strictly linear, since you can collapse each 2-wide column in linear time from the 1-wides, and each 3-wide in linear time from a 2-wide and a 1-wide, etc. Can you give me the script you're using for gnuplot? I don't want to tackle it without some bootstrapping.

I'm also still curious about your assertion that the outer memoization is useful, given my results with a naive outer loop. And, by the way, I appreciate you taking the time and engaging with me on this; I'm learning a lot here.

conform January 17 2009, 00:08:23 UTC

By the way, after I found that 1-D algorithm, I spent most of the day trying to wrap my head around whether there was a 2-D extension (presumably in O(n^2)). Just thinking about it made my head hurt, but I still wonder if it's out there somewhere.

conform January 17 2009, 00:18:28 UTC

there's an appreciable speedup from changing the first three business lines of max_rect_sum from:

for x1 in range(len(grid[0])):
for x2 in range(x1, len(grid[0])):
local_o_sum, local_o_bounds = max_rect_column_sum(grid, x1, x2)

to

for w in range(len(grid[0])):
for x in range(0, len(grid[0]) - 1 - w):
local_o_sum, local_o_bounds = max_rect_column_sum(grid, x, x + w)

pmb January 17 2009, 02:02:52 UTC

That is because you are warming up your caches in the right order in that second example.

conform January 17 2009, 02:12:34 UTC

Yeah, I meant that as a confirmation of my earlier suspicion.

pmb January 17 2009, 02:00:06 UTC

Fine with me if you get one - take it up with admissions, the registrar, and the money folk.

I wasn't using a script at all. I would have a command line like:
for i in `seq 10 5 200`; do python gen_problem.py | time python smus.py 2>&1 | grep Real | sed -e 's/Real//' | xargs echo $i; done >> data

The third number in the seq command would be 50 for my script and was 200 for your (faster) one.

Then in gnuplot I would type:
set logscale # This step is optional
f(x) = a*x**b
fit f(x) 'data' via a,b
plot 'data', f(x)
More options to the plot command will make it prettier, but the "fit" line is kind of all you REALLY need. But plotting it is comforting somehow, both in logscale and in normal scale ( ... )

conform January 17 2009, 02:12:04 UTC

"The way I wrote *mine* the outer memoization was useful, because I wrote it in a needlessly recursive style."

Ah, makes sense. I meant "vs a naive approach". No question that your exact code benefits from it.

Back to all threads