Dead Sea Googling

Jul 21, 2004 18:10

I was demonstrating Dead Sea Googling in comments on another blog recently, and realized that I haven't seen anybody else talk about it.



Google News indexes a wide variety of news sources. Some of these sources don't want to provide their articles for free. You find a story, you click a link on Google's hit page, you find yourself at the site of the East Overshoe Picayune-Intelligencer, and you're stopped cold.

The content you requested is reserved for eastovpicayintel.com members only.

Okay, so you have to register, or subscribe to the paper, or pay them for archive access, right?

But here's the thing. They were willing to share the text of their article with Google.

Keep asking Google different questions about the article, and its hit pages will fetch different portions of its text.

The Dead Sea Scrolls were a tremendous find. But for decades, they were in the custody of scholars who refused to publish their content. They kept promising to release the scrolls one day. They compiled a concordance to the scrolls.

Some other scholars got tired of this. They took the concordance, which was rather thorough, and used a computer to reconstruct the entire text of the one of the documents, independently of the guys who had custody of the physical scrolls. Then they published it so everyone could have equal access to the text. This embarrassed the procrastinating custodians into allowing unrestricted publication.

You see the analogy.

Once you start poking at Google News to try revealing pieces of a hidden article, it quickly becomes a game. How much can you shake loose? Can you guess crib words or phrases that will unveil a few more words, or a previously unguessed sentence?

Fire Heading Away From Homes
Los Angeles Times (subscription), CA - Jul 20, 2004
By Zeke Minaya and Eric Malnic, Times Staff Writers. A stubborn wildfire flared out of control near Santa Clarita on Monday afternoon ...

Let's try "fire heading away from homes." That brings up the story in isolation. I also see that its link is colored purple, because I've been there before. But I don't get any new phrases.

We'll try looking beyond the end-- "fire heading away from homes" and "monday afternoon:"

A stubborn wildfire flared out of control near Santa Clarita on Monday afternoon, but officials said the flames were moving into remote brushland and no longer ...

The quoted portion now ends in "no longer." Try "fire heading away from homes" "no longer:"

... wildfire flared out of control near Santa Clarita on Monday afternoon, but officials said the flames were moving into remote brushland and no longer posed an ...

I could try "longer posed an," but I'm going to guess that the next phrase is "immediate threat." Let's try "immediate" along with the title:

... of control near Santa Clarita on Monday afternoon, but officials said the flames were moving into remote brushland and no longer posed an immediate threat to ...

I was right, but my extensive knowledge of newspaper cliches didn't help me very much. I guess I'll keep extending until I run out of steam. Try the title and "immediate threat to:"

... near Santa Clarita on Monday afternoon, but officials said the flames were moving into remote brushland and no longer posed an immediate threat to homes or ...

Now "fire heading away from homes" "threat to homes or:"

... Clarita on Monday afternoon, but officials said the flames were moving into remote brushland and no longer posed an immediate threat to homes or businesses. ...

Now "fire heading away from homes" "homes or businesses:"

... Clarita on Monday afternoon, but officials said the flames were moving into remote brushland and no longer posed an immediate threat to homes or businesses. ...

Uh-oh, I'm stuck at the end of a sentence. Can't get any further this way. Let's stop asking for the full title and search on "fire" "threat to homes or businesses."

... were moving into remote brushland and no longer posed an immediate threat to homes or businesses. ... moving aggressively to get a line around this fire," she said ...

I have a new phrase. Try "fire" "moving aggressively." Now I get three hits, because not only are California firefighters moving agressively, but Baghdad is a free-fire zone where Prime Minister Allawi is moving aggressively, and the Cassini spacecraft fired engines made by Aerojet, which is moving aggressively in the space business.

Actually, there's nothing better than having a good rocket engine around when you want to move aggressively.

It's easy to spot the link I want, though, because I've visited it before; it's purple and the others are blue.

... We're moving aggressively to get a line around this fire," she said. "It's hot and steep and brushy up there. Those crews are working hard.". ...

Okay, here's a lot more to chew on. But I think you've gotten the idea.

When you run out of trails to follow, take a guess. Seeing the above phrases, I would try "spokesman" (or spokeswoman or spokesperson), "department," "firefighter," "fire fighter," "firefighters," "brush," "hills," and maybe "helicopters." Names of nearby towns. "California."

This will get you pretty far, especially if you have a stock of cliches handy. You will eventually hit limits, though.

The very definition of "news" is "information that something is different." The unpredictable. As Claude Shannon said, echoing Ludwig Boltzmann's tombstone, "S equals k log W." The information content of a message is higher if the news it contains is more improbable. Or, as Neil Rest likes to say, "Information is Surprise."

So Googling by predicting stuff you expect will be in the story is not always going to reveal all the sentences therein. If your Dead Sea Google target takes a right-angle turn in mid-story, you may lose it. Wilder guesses might recover the trail.

I admit to feeling a little guilt at trying to evade the fee the news provider wants me to pay. But quickly I am too lost in the puzzle-solving to care.

An entertaining road-rally-type game might be played by competing to find the smallest set of Google News queries that reproduces the entire text of a hidden article in the fragments quoted among the hits. (This presumes that the plaintext is available to check against. Someone would have to break down and register.)

Or you could simply go for speed. We could call this "Dead Sea Water-Skiing."

Publications will probably evolve defenses against a Dead Sea Googling attack. My brother's magazine disappeared from Google News a couple of weeks after the service began. Now I can only see what he has to say when he gets quoted in a Googlespheric publication, unless I shell out $189 a year.

dead sea googling, searching, google

Previous post Next post
Up