Asynchronous programming with Python 2.5 II

Dec 04, 2006 22:04

Okay, so to recap, this is our dilemma:
  1. We have to do all kinds of slow I/O-y things.
  2. We want to do them asynchronously to maximize throughput
For the sake of example, let's say our program needs to mass-fetch some URL and print them out in order. It's pretty lame, but all the pieces are part of the Python standard library, so we can get going without much cruft.

The single-threaded, single-URL version is simplicity itself
import urllib
import sys

print urllib.urlopen(sys.argv[1]).read()Real impressive, that. Now let's generalize for an arbitrary number of arguments:
import urllib
import sys

for url in sys.argv[1:]:
print urllib.urlopen(url).read()Cool. There's just one problem: between webgets, there's a bit of a delay. Let's pretend that we're Apple and care about the user experience with our tool: Delays suck. Now, all of that time that we're waiting is spent waiting on a socket, and not CPU or memory, so we can make things faster by doing the webgets in a thread all at the same time, and collecting the results as they come in.
import threading
import sys
import urllib

def getUrl(url):
return urllib.urlopen(url).read()

def getUrlAsync(url, onComplete):
def threadProc():
try:
result = getUrl(url)
onComplete(result, None)
except Exception, e:
onComplete(None, e)

thread = threading.Thread(target=threadProc)
thread.run()

def printUrl(result, error):
if error:
raise error
else:
print result

for url in sys.argv[1:]:
getUrlAsync(url, onComplete=printUrl)Things are getting complicated now. First, you'll notice that we have to manually pass exceptions to onComplete handlers. If we don't, the exception will run up the thread's stack, and halt the thread, preventing our onComplete handler from being called.

The problem, of course, is that we're re-raising the exception, and thus ruining the nice stack trace we would have otherwise gotten. There are ways around that, but they're contortive and could probably occupy a writeup of their own.

Anyway, this is neat because now we're grabbing all of these URLs right at the start, and processing them as soon as they're ready.

Tune in next time when I get around to creating a task pool maybe!
Previous post Next post
Up