For a long time now, I've been using
spamassassin to filter my spam, this worked "generally" well i was getting some 2-300 messages/day filtered and anywhere from 50-100 unfiltered that I'd just delete myself.
Many people had talked about
dspam, claiming that "statistical" analysis is far better, I was always a little skeptical because spamassassin worked "well enough" for me, and I had become accustomed to deleting spam and almost "forgot about it"
But dspam is a pain in the butt, when I did have it worked, the web interface only lets you "retrain" messages if they are on the first page, apparently theres a bug where after the first page you can't, and it doesn't seem to have been fixed. I then tried to install it locally and it kept giving me stupid errors, saying message is "not spam and not ham", and erroring, so I gave up...
Enter
spamprobe, it works along similar lines to dspam, in that it's a statistical (bayesian) filter, but it's very simple, all it does is take a message, and score it, it dumps the score on stdout and you do what you will with it. It doesn't munge messages, it doesn't have web interfaces, its simple, nice, and I liked that.
So I constructed an appropriate procmail file that looks like this:
:0
SCORE=| /home/lathi/local/bin/spamprobe -8 receive
:0 wf
| formail -I "X-SpamProbe: $SCORE"
:0 a:
*^X-SpamProbe: SPAM
dirty
:0 a:
*^X-SpamProbe: GOOD
cleaned
As you can see we use formail to munge the message as spamprobe does not do that, I then got my last 2000 messages, and classified them all as HAM or SPAM --- the important thing to note here is that im classifying *both* good *and* bad messages, I did this with a couple mutt macros
macro index,pager S "unset wait_key\n
~/local/bin/spamprobe spam\nset wait_key\n"
macro index,pager D "unset wait_key\n
~/local/bin/spamprobe good\nset wait_key\n"
macro index,pager H "unset wait_key\n
~/local/bin/spamprobe good\nset wait_key\n"
so 'S' marks as spam and deletes, 'D' marks as ham and deleted, and 'H' marks as ham and keeps.
Having done this, it was already *far* more accurate than dspam, and now 2 weeks later.. the only spam I am getting at all is some non-english ones, it seems it doesn't handle that specific one too well, but there seems to be an option about UTF8, etc, messages I may need to fiddle with.
Having said that, unfortunately it has got a few false positives (10ish) but I guess thats not too bad "overall", but I'm not ready to start /dev/nulling the spam yet.
My only other concern is the size of the spamprobe directory
lathi@creep:~$ du -sh .spamprobe/
63M .spamprobe/
It has a 'clean up' function, so I figure It'll hit the "top" of its usage at some point.
In short, I'm very impressed by spamprobe and having next to no spam, certainly makes life a little nicer in e-mail reading land.