The only intuition I have is to write your program in the most recursive-ish side-effect-free style you can, and then to check whether you ever call your function twice with the same arguments. If you ever do, then dynamic programming is the solution, and memoization is the easy way to get there.
So my solution is significantly faster than yours (O(n^3) vs O(n^4)), but it was throwing me off that your outside recursion is no improvement over enumeration. In fact, it's slower than a straightforward enumeration, since you have to build a bunch of call stacks, etc.
I think I just realized that the important recursion that makes your approach "dynamic" is the one in your total() function (essentially the same as the one I produced in grid_sum()). This makes me feel much better about the world and my understanding of it.
max_rect_sum calls max_column_sum n*n times, and max_column_sum is O(n^2) with its two nested loops each of extent n.
Depending on how you build up your solution, both memoizations may be important, or neither. The way I built mine, both are important. The way coldtortuga built his, neither are.
max_rect_column_sum is O(n), having only one loop. and my approach is definitely faster -- try running my code with D=60 or D=100 and compare to yours.
i'm also not sure about your assertion that the outer memoization is significant. if i replace your outer loop with an enumerative one, i get results like the following (in order: my code, your original, your enumerative):
sea-smoke:~/problems conform$ cat matrix_60 | time python 108_max_rect.py 2781 1.42 real 1.36 user 0.04 sys sea-smoke:~/problems conform$ cat matrix_60 | time python peter.py 2781 59.62 real 58.88 user 0.61 sys sea-smoke:~/problems conform$ cat matrix_60 | time python peter_enumerative.py 2781 27.58 real 27.13 user 0.32 sys
I can't argue with that timing data, but I will continue to quibble with your assertion that your solution is both correct and O(n^3) because you need to nigh-on-fully-populate a 4-D array inside of grid_sum.
If it doesn't do so, then there exist cases where your program and mine will return different values, and I'm pretty sure mine will be right and yours will be wrong.
I can't tell how much you're actually looking at my code, and how much you're applying Occam's Razor to your CS knowledge vs. mine. :)
My approach (which consists of taking someone else's algorithm for a linear-time 1D search and extending it to a 2D problem) doesn't fully populate the search space. It considers every vertical slice of the input array (roughly n^2/2 slices), and for each one, performs a linear search on it to find its maximal subrect.
The simplest way for you to convince me that I'm wrong here is to find a flaw in or counterexample for the linear 1D algorithm:
def max_sequence(array): max_sum = -float('infinity') current_sum = 0 start = 0 for end in range(array): current_sum += array[end] if current_sum > max_sum: max_sum = current_sum if current_sum < 0: current_sum = 0 start = end + 1 return max_sum
This is weird. Your algorithm runs in time O(n^3.5) (measured) and I haven't been able to find a single case where yours and mine disagree (and I wrote a shell script and tried hundreds).
Weird. It now runs in time O(n^3.3) (measured) Oh those wacky hash collisions.
After finally understanding your 1-D algorithm I now buy your 2-D algorithm. It really is pretty clever! Certainly cleverer than mine. Because the answer to my unbelieving query is: NO. No you do NOT have to fully populate a 4-D array, and you can determine the answer in the 1-D case in O(n) time, and not O(n^2) time.
Dammit. Now I feel like a doofus. You were right all along! Your approach is faster and better and remains correct.
My suspicion is that the extra fractional exponent can be eliminated by tuning the cache generation -- it should be strictly linear, since you can collapse each 2-wide column in linear time from the 1-wides, and each 3-wide in linear time from a 2-wide and a 1-wide, etc. Can you give me the script you're using for gnuplot? I don't want to tackle it without some bootstrapping.
I'm also still curious about your assertion that the outer memoization is useful, given my results with a naive outer loop. And, by the way, I appreciate you taking the time and engaging with me on this; I'm learning a lot here.
By the way, after I found that 1-D algorithm, I spent most of the day trying to wrap my head around whether there was a 2-D extension (presumably in O(n^2)). Just thinking about it made my head hurt, but I still wonder if it's out there somewhere.
Fine with me if you get one - take it up with admissions, the registrar, and the money folk.
I wasn't using a script at all. I would have a command line like: for i in `seq 10 5 200`; do python gen_problem.py | time python smus.py 2>&1 | grep Real | sed -e 's/Real//' | xargs echo $i; done >> data
The third number in the seq command would be 50 for my script and was 200 for your (faster) one.
Then in gnuplot I would type: set logscale # This step is optional f(x) = a*x**b fit f(x) 'data' via a,b plot 'data', f(x) More options to the plot command will make it prettier, but the "fit" line is kind of all you REALLY need. But plotting it is comforting somehow, both in logscale and in normal scale
( ... )
Reply
I think I just realized that the important recursion that makes your approach "dynamic" is the one in your total() function (essentially the same as the one I produced in grid_sum()). This makes me feel much better about the world and my understanding of it.
Reply
max_rect_sum calls max_column_sum n*n times, and max_column_sum is O(n^2) with its two nested loops each of extent n.
Depending on how you build up your solution, both memoizations may be important, or neither. The way I built mine, both are important. The way coldtortuga built his, neither are.
Reply
i'm also not sure about your assertion that the outer memoization is significant. if i replace your outer loop with an enumerative one, i get results like the following (in order: my code, your original, your enumerative):
sea-smoke:~/problems conform$ cat matrix_60 | time python 108_max_rect.py
2781
1.42 real 1.36 user 0.04 sys
sea-smoke:~/problems conform$ cat matrix_60 | time python peter.py
2781
59.62 real 58.88 user 0.61 sys
sea-smoke:~/problems conform$ cat matrix_60 | time python peter_enumerative.py 2781
27.58 real 27.13 user 0.32 sys
Reply
If it doesn't do so, then there exist cases where your program and mine will return different values, and I'm pretty sure mine will be right and yours will be wrong.
Reply
My approach (which consists of taking someone else's algorithm for a linear-time 1D search and extending it to a 2D problem) doesn't fully populate the search space. It considers every vertical slice of the input array (roughly n^2/2 slices), and for each one, performs a linear search on it to find its maximal subrect.
The simplest way for you to convince me that I'm wrong here is to find a flaw in or counterexample for the linear 1D algorithm:
def max_sequence(array):
max_sum = -float('infinity')
current_sum = 0
start = 0
for end in range(array):
current_sum += array[end]
if current_sum > max_sum:
max_sum = current_sum
if current_sum < 0:
current_sum = 0
start = end + 1
return max_sum
Reply
Reply
Tuning it as suggested by Dan at
http://pmb.livejournal.com/79825.html?thread=757201#t757201 and retesting right now...
Weird. It now runs in time O(n^3.3) (measured) Oh those wacky hash collisions.
After finally understanding your 1-D algorithm I now buy your 2-D algorithm. It really is pretty clever! Certainly cleverer than mine. Because the answer to my unbelieving query is: NO. No you do NOT have to fully populate a 4-D array, and you can determine the answer in the 1-D case in O(n) time, and not O(n^2) time.
Dammit. Now I feel like a doofus. You were right all along! Your approach is faster and better and remains correct.
Reply
My suspicion is that the extra fractional exponent can be eliminated by tuning the cache generation -- it should be strictly linear, since you can collapse each 2-wide column in linear time from the 1-wides, and each 3-wide in linear time from a 2-wide and a 1-wide, etc. Can you give me the script you're using for gnuplot? I don't want to tackle it without some bootstrapping.
I'm also still curious about your assertion that the outer memoization is useful, given my results with a naive outer loop. And, by the way, I appreciate you taking the time and engaging with me on this; I'm learning a lot here.
Reply
Reply
for x1 in range(len(grid[0])):
for x2 in range(x1, len(grid[0])):
local_o_sum, local_o_bounds = max_rect_column_sum(grid, x1, x2)
to
for w in range(len(grid[0])):
for x in range(0, len(grid[0]) - 1 - w):
local_o_sum, local_o_bounds = max_rect_column_sum(grid, x, x + w)
Reply
Reply
Reply
I wasn't using a script at all. I would have a command line like:
for i in `seq 10 5 200`; do python gen_problem.py | time python smus.py 2>&1 | grep Real | sed -e 's/Real//' | xargs echo $i; done >> data
The third number in the seq command would be 50 for my script and was 200 for your (faster) one.
Then in gnuplot I would type:
set logscale # This step is optional
f(x) = a*x**b
fit f(x) 'data' via a,b
plot 'data', f(x)
More options to the plot command will make it prettier, but the "fit" line is kind of all you REALLY need. But plotting it is comforting somehow, both in logscale and in normal scale ( ... )
Reply
Ah, makes sense. I meant "vs a naive approach". No question that your exact code benefits from it.
Reply
Leave a comment