Bufferbloat

Jan 07, 2011 13:40

I am not sure how I found it, but I ran across a very interesting series of articles about the problem of "bufferbloat". Here's one of the key posts: http://gettys.wordpress.com/2010/12/06/whose-house-is-of-glasse-must-not-throw-stones-at-another/

This makes a lot of sense to me. I have on multiple occasions had to explain to people why too much buffering on network ports is wrong - the buffers get so full that latency rises very high. This causes TCP's congestion control algorithms to fall apart. TCP looks for packet loss, and when it sees packet loss it knows that it has reached its limit for what it can send on a link and adjust appropriately.

Without seeing the packet loss for a long time, it cannot adjust reliably and ends up with huge oscillations. You can visualize it like this: Picture driving a car and trying to keep it at a constant speed. Normally you get instantaneous feedback, so when the car starts to slow down because the slope of the road changes you pick it up pretty quickly and press on the gas pedal a little more. Now picture trying to drive a car remotely across a link that has a several second delay (ignore steering for the moment). As you are going along, you see the speed start to drop so you press the pedal a little more. The speed is still dropping, so you press more, and more, until after a few seconds you see the speed shoot up well past your desired speed and you let off the gas entirely (maybe even hit the brakes). But it's still going up... Wait, now the car slows down very quickly, much quicker than you expect, so you press the gas pedal to the floor. But it's still slowing down... Oh damn, now the car's speeding up way too fast! Etc...

In practice, what TCP ends up doing is slamming on the brakes, then after some time it ramps back up, after getting a little data out realizes it's going too fast and slams on the brakes again - spending most of its time going much slower than nominal and in the end getting very little real data through.

The problem is amplified by feeding back on itself (sending more data increases the amount of buffered data which increases latency), and potential synchronization of multiple TCP streams (multiple sessions all hitting the gas/brake at the same time) that makes it worse...

The usual fix proposed is to update network equipment so that it drops traffic before this situation occurs. A combination of setting buffer sizes appropriately and a queuing algorithm that prevents the queues from getting too big (such as RED and automatic queue manegement). However, thinking about the overall problem another solution occurred to me.

In the early days of IP and TCP, memory was expensive and buffers were small, so congestion showed itself primarily with dropped packets - too many packets would reach a slow link, the queue would overflow, and packets would be dropped. However, now memory is cheaper and congestion shows itself first with an increase in latency. Why not enhance TCP to trigger its congestion control mechanisms on latency increases (in addition to packet loss)? This way every TCP connection across a link will work together to keep latency low for everyone on that link. When queues build up at a congested link, everyone connection on that link will start slowing down. It does not require changes in intermediate network devices which can be difficult to upgrade (if even possible).

It will have to be careful to watch for changes in latency, and adjust appropriately to real step changes - for instance a route change that causes packets to flow over different paths. TCP already measures the latency of a connection in terms of round trip time (RTT). I don't know if the numbers are consistent enough to give a good enough indication of congestion, but I think it's an idea worth pursuing.

Edited to add:

OK, so my idea's only about 15 years late: TCP Vegas already does this. Reading research papers now to try to understand why it doesn't just fix the Iinternet :)
Previous post
Up