Yeah, Google gets closer and closer to DWIM every day.
In a world with magitechnology, DWIM would be abused just as hard by spammers as Google is today. On the other hand, in a world with magitechnology, every time that spam arrived in a mage's mailbox, someone's server would catch on fire.
That led to some fascinating Google searches. I hope that we can find out more about the matter. What else have you checked to see how Google's behavior has changed - have they made any announcements that seem relevant?
Favorite techniques after searching: changing the code direction and the more effort-intensive but accessible mod_rewrite and PHP/JS trickery recommended by A List Apart. If I were a better programmer, I'd try to implement the latter in Python/Pylons, since I have a vague idea of how it could be done.
I haven't seen anything from Google, and I'm not certain how to contact them about this sort of issue (it doesn't seem like the sort of thing to report to their security@ bounce), but now that the story has hit /. hopefully answers might be forthcoming.
Incidentally, as a counterpoint to the links you provided above, I followed a comment in one of those pages to http://jasonpriem.com/2009/05/stop-obfuscating-email/ . I disagree with its main argument, but its point about the poor security of email obfuscation is well taken.
It's in Google's interest to be able to index simple document.write-generated HTML, since it's so common. They're definitely not doing full-blown Javascript execution. I munge email addresses on my own website by swapping every pair of letters and the Google snippet doesn't show the results of that. This also suggests that it's not simply a time-bounded execution, since my decoder takes almost no time to run.
Project Honey Pot's suggestion that a harvester would need a full-blown Javascript engine seems a little ridiculous. I bet you can get a lot of the way with a bounded-time non-Turing complete subset of Javascript.
Huh, that's actually even stranger: there's no substantial difference in our two routines except for yours calling a subfunction. I mean, ultimately they both boil down to taking a hardcoded string as input, tweaking with the input in various ways and document.write'ing it. And yet, you're right, on your site Google hasn't seemed to pick it up. (Though your address is compromised in a hundred other ways ...)
I suppose this could also be taken as evidence that Google doesn't (currently) interpret pages based on items included by reference, since you call a separate .js file for your functions. That theory would benefit from a more rigorous test.
I have a page I help maintain, and I also reference an external .js file that does the munging, and it also appears immune from Google's address harvesting.
Comments 32
This reminds me of that meta-Google search engine that you wrote about in the T-lands universe, though. ;)
And also reminds me of Chibi Jesus. XD Good times!
Reply
In a world with magitechnology, DWIM would be abused just as hard by spammers as Google is today. On the other hand, in a world with magitechnology, every time that spam arrived in a mage's mailbox, someone's server would catch on fire.
Reply
Reply
Reply
Favorite techniques after searching: changing the code direction and the more effort-intensive but accessible mod_rewrite and PHP/JS trickery recommended by A List Apart. If I were a better programmer, I'd try to implement the latter in Python/Pylons, since I have a vague idea of how it could be done.
Reply
Incidentally, as a counterpoint to the links you provided above, I followed a comment in one of those pages to http://jasonpriem.com/2009/05/stop-obfuscating-email/ . I disagree with its main argument, but its point about the poor security of email obfuscation is well taken.
Reply
Reply
Project Honey Pot's suggestion that a harvester would need a full-blown Javascript engine seems a little ridiculous. I bet you can get a lot of the way with a bounded-time non-Turing complete subset of Javascript.
Reply
I suppose this could also be taken as evidence that Google doesn't (currently) interpret pages based on items included by reference, since you call a separate .js file for your functions. That theory would benefit from a more rigorous test.
Reply
Reply
Reply
Leave a comment