PumaCreed

Dec 30, 2007 03:33


So my little project while I’m bored at home is PumaCreed - a poorman’s Google MapReduce. (astute hunter-types will notice the anagram name)

I’ve been toying with the idea for the past few days since I got inspired by the MapReduce chapter in Beautiful Code. Mostly it’s a cute little project that’s trendy and scratches an itch of mine - namely, how do I utilize the full power of my little cluster of machines?

My end goal is to bootstrap this on top of the raytracer Lin and I did in CS184 as sort of a proof-of-concept, and run speed tests. Other simple ideas include MP3 encoding (split the wav file, give each machine a piece, and then mp3wrap them together at the end) or if I get really daring, video re-encoding (which would be damn cool for saving off my HD streams)

The idea on top of all of this is that: you split up a problem, you do something to the pieces (Map), you can emit intermediate values (to be further sorted or combined in Reduce), and then you can combine the problem back into the original spec in some way (or not, depends on how you write Reduce). So long as there’s a common NAS (Granted, it’s no GFS, but then, I’m not dealing with petabytes) there can be the necessary file output sharing. SSHFS counts too - it’s just slower.

It’s also cute in that it can spread other time-intensive tasks across machines. It doesn’t even have to transfer files if what you want to compute is somehow representable. Since I’m writing it all in Python (or C# where necessary) a good example is to spread minimax subtrees across machines to make a faster, smarter CS188 Pacman (which would merely return the value of the root-node move - all across TCP).

The backend interface looks something like this right now:

[02:37] michener@enjolras:~$ telnet 192.168.0.16 6278 Trying 192.168.0.16... Connected to 192.168.0.16. Escape character is '^]'. Welcome to the PumaCreed Server on fantine. Type 'help' for details. > help help jobs ls newjob stat quit shutdown > stat Computer Name System Ranking Threads Description fantine Linux 9000 1 2.6.21-2-686 #1 SMP Wed Jul 11 03:53:02 enjolras Darwin 9000 1 9.1.0 Darwin Kernel Version 9.1.0: Wed O > shutdown Connection closed by foreign host. [02:38] michener@enjolras:~$

Not much, but it’s a start. The machines know about each other, there’s networking and threads and config files going on (praise be unto Twisted) - not to mention the start of a MapReduceProgram class from which all code run on the cluster should inherit (Or at least implement the interface of).

We’ll see how it goes.
Crossposted from photonzero

hack, programming, facebook, nerdery, python, livejournal

Previous post Next post
Up