Google Think

Mar 15, 2007 07:09

So I've been thinking about Google: what it does, how it does it, and limitations of their approach. One question that is quite intriguing to me is the following scenario:

Suppose a user gives Google some information (eg search terms, a file). If Google was allowed to take some time (a minute, an hour, a day even), what sort of services could it offer that it can't offer now?

This "long-time" effect will be a constraint on our thinking. Google could just do a really slow and stupid search, but that doesn't count. You have to come up with an idea that genuinely takes time.

Thinking about how they currently do things can give some indication to how tricky this is. One of the ways they make things lightning-fast is to do precalculations. After the Googlebot has trawled the web, sucking up documents and following links, an inverted index is made, specifying which documents contain which interesting words. It takes about two months from go to woah for Google to index the web from scratch. It then does precalculations to rank documents based on how useful they might be (this is done via PageRank). Of course it doesn't take two months to search the web for you: it does all the calculations in the background and what you're searching is the processed data. This is still a monumental task, but they get speed by distributing computing and caching certain queries. And the data they hold isn't necessarily two months old; they check frequently-updated sites regularly and update their data incrementally.

So to get a "long" task, we have to have something that can't be precomputed. I can't say, "Hey Google, I want to go on a holiday to the Bahamas, what's the best deal?" because they can do this processing for every city in the whole world and have a regularly updated "This is what Google recommends for the Bahamas" set of data. Even if you throw in certain constraints, that just narrows down the search range and doesn't require any special processing to deal with this.

I can't ask for trends because it can correlate those from the news and web trawls based on time. (And they already do it, lightning-fast)

I can't give Google a document and say, "Hey Google, what documents are like this one?" because that just requires characterizing your file and then it'd run a search for key terms or something like that. For more subtle, accurate results, maybe this could be done, but all the time has to fall on characterizing your file because we already know what every other document in the world is like (aka web-trawling).

Directions to places are already done quickly and fairly accurately. Real-time services like Google Ride Finder are useless to give after a minute's computation.

The only thing I could think of so far was that you give Google a bunch of source code and it compiles it for you using parallel-compilation. This is essentially a renderfarm approach (you could equivalently give it some 3d animation files and ask Google to render it for you).

Any ideas?

discussion, miscellaneous

Previous post Next post
Up