Computer thought for the day

May 08, 2006 17:57

How hard would it be to create basic language-recognition software? And then tie it to spam-filtering software? For example, I could tell it I speak English, Spanish, and Catalan. Then anything that it recognized as being a different language, it could send straight to my spam folder. I mean, anything that's in a non-Roman alphabet could go ( Read more... )

computer

Leave a comment

Comments 8

skington May 9 2006, 00:22:38 UTC
It's been done.

What you do is you apply statistical analysis to (typically) the words you encounter in spam, and the words you encounter in ham (i.e. mail that isn't spam). When new mail comes in, your spam filter looks at the message contents, and decides whether it's spam or not spam depending on the most interesting words (interesting means definitely spam or definitely ham, depending on what you've received previously). If you're getting Russian spam and all the emails you've ever received in Russian (and that you've told your spam filter about) have been spam, the common Russian words for "the", "a", "be", "have" etc. will trigger the 100% spam alert, and the email will get canned.

I don't know what email client you're using, but search for "Bayes", "spam" and the name of your email client, and you should find something useful. On a Mac, I'd recommend SpamSieve.

Reply

sparkofcreation May 9 2006, 04:45:09 UTC
Well, I have Gmail, so it's rather academic.

Ironically, when I checked my email just now, I had two messages: your comment, and a Portuguese spam begging me do donate 13 real-cents (R$ 0.13) to help a Brazilian child born with elephantiasis. In fact it was the second identical one today.

I understand a fair bit of Portuguese, obviously; but I don't really speak it, so no one would ever send me a real email in it.

Reply

jonjon_nl May 9 2006, 06:52:46 UTC
But I could send an e-mail entirely written in English and sign it... :(
My name is... João.

Reply

sparkofcreation May 9 2006, 13:19:36 UTC
Well, if there was only one ã in the text, (I hope) the software would mark it as English, not Portuguese.

Reply


chiller2 May 25 2006, 02:17:31 UTC
This is as you pointed out academic as you use Gmail but there are a number of spam filters based on an open source program called SpamAssassin. One of the great things about it is you can specify the languages and locales that you would like to accept mail from.

It's not 100% foolproof as it relies partly on certain message headers that are 'supposed' to be in the e-mails which is possibly why it hasn't yet been included in Gmail, but who knows? Google are always adding something to their arsenal :)

Reply


jkrissw May 28 2006, 04:27:28 UTC
Hi - I just saw your comment in linguaphiles and noticed you're in Albuquerque. What part? I'm moving to Vail St (near Carlisle) in a week.

Reply

sparkofcreation May 28 2006, 05:33:04 UTC
Northeast Heights, near the intersection of Juan Tabo and Eubank. :-)

Reply

jkrissw May 28 2006, 12:27:00 UTC
Mind if I "friend" you?

Reply


Leave a comment

Up