How hard would it be to create basic language-recognition software? And then tie it to spam-filtering software? For example, I could tell it I speak English, Spanish, and Catalan. Then anything that it recognized as being a different language, it could send straight to my spam folder. I mean, anything that's in a non-Roman alphabet could go
(
Read more... )
Comments 8
What you do is you apply statistical analysis to (typically) the words you encounter in spam, and the words you encounter in ham (i.e. mail that isn't spam). When new mail comes in, your spam filter looks at the message contents, and decides whether it's spam or not spam depending on the most interesting words (interesting means definitely spam or definitely ham, depending on what you've received previously). If you're getting Russian spam and all the emails you've ever received in Russian (and that you've told your spam filter about) have been spam, the common Russian words for "the", "a", "be", "have" etc. will trigger the 100% spam alert, and the email will get canned.
I don't know what email client you're using, but search for "Bayes", "spam" and the name of your email client, and you should find something useful. On a Mac, I'd recommend SpamSieve.
Reply
Ironically, when I checked my email just now, I had two messages: your comment, and a Portuguese spam begging me do donate 13 real-cents (R$ 0.13) to help a Brazilian child born with elephantiasis. In fact it was the second identical one today.
I understand a fair bit of Portuguese, obviously; but I don't really speak it, so no one would ever send me a real email in it.
Reply
My name is... João.
Reply
Reply
It's not 100% foolproof as it relies partly on certain message headers that are 'supposed' to be in the e-mails which is possibly why it hasn't yet been included in Gmail, but who knows? Google are always adding something to their arsenal :)
Reply
Reply
Reply
Reply
Leave a comment