Finally the word is officially out!.. Yahoo is publicly announcing it's wide interest and involvement with
Hadoop , an open source project started by Doug Cutting, the
Lucene guy! under the
Apache Software Foundation aiming to offer a platform for parallelizing user applications on commodity hardware, large scale distributed computing is the name of the game. It is an implementation of the
Map/Reduce programming framework. and also provides a distributed file system(
hdfs) to go with it.
+
Yahoo is going full throttle on the PR with it's Grid initiative..
- It appeared yesterday on Jeremy Zawodny's blog which refers to the Yahoo ydn tech blog entry announcing the same and have put up milestones we'v covered on the project.
- Doug and Eric (Yahoo's Director of Grid Computing) are presenting a talk on Hadoop right now at OSCON, Portland.
- David Filo (Chief Yahoo and Co-Founder) and Doug spoke with Tim O'reilly after his panel appearance at OSCON which might lead to him talking bout Yahoo and Hadoop on his blog. Talks are underway about a book being published on Hadoop, Well!.. we already have our animal figured out for the Nutshell series ;-)
Hadoop appearances:
- Hadoop's biggest deployment anywhere is in Yahoo, a 2000 node cluster to empower research on Ad Systems and Web Search
- We're already seeing universities begin to teach about Hadoop (University of Washington) and looking at building their own clusters (Carnegie Mellon University).
- Pig, a Yahoo project that's being made available under a BSD-style open source license, provides an sql-like interface for querying extremely large data sets uses Hadoop to parallelize it's computations.
- HBase, a Bigtable-like storage mechanism over Hadoop DFS. It provides a means for organizing and efficiently accessing large data sets that is offered by HDFS.
Map-Reduce Appearances
- Mars is a Map-Reduce framework on GPUs. It uses the NVidia CUDA architechture for GPGPU programming.
- Here is a Map-Reduce implementation on the Cell Broadband engine
An official list of projects that are powered-by Hadoop can be found
here. So go ahead and play around with it and get involved! Everyone's gonna eventually!! :-D
All Hadoop updates are put up here
http://blog.hadoop.com/