It’s pointless. As soon as any conceivable filter became effective the spammers would change the spam to work around it. Eventually spam will become statistically indistiguishable from genuine mail, by a sort of Darwinian process.
As pointed out in the current New Scientist, nothing but legislation will do it (as it did with pirate radio staions in ships off Europe in the 70s), and as most spam originates in the US good ol’ Bushie has to do it. As if.
If spam WERE turly indistinguishable from normal mail, then it wouldnt be a problem. The very fact that we want to filter out spam means that there is some distinguishing feature. At the very least, it has to try and SELL you something (Im neglecting email harvesting spam since it will wither and die if for-profit spam dies). How exactly can you pitch a pre-approved credit card without using the words “credit, preapproved, money, card, free” etc.
But how many valid emails do you want to risk losing if they contain just those terms? How long do you want to spend wading through a list of suspected emails if your filter picks up on about as many valid emails as invalid ones?
Filtering isn’t the whole solution. But laws aren’t, either: Email knows no nationality. For something that has a chance of actually stemming the tide, look at hashcash.
Just an update. I narrowed the list to the following by running it against all my email (over 3000 messages) & tossing out false positives. This narrows the list to 83 items:
It doesn’t take long to type them in, although the process leaves me wishing that Outlook supported regex.
The results were that 39 email of the 2000 mail test set were filtered for content (“pen1s”, “gold card”, “phent”, etc.). 13 email were rejected for “random letter inclusion”. There was a number of false positives in the random letter filter that were related to a number of technical email with Unix commands in the (“chmod”) but I can put an exception to the rule for mail from specific addresses like those list servers to prevent that. I get false positives from my content filter, too.
On the surface, the random letter filter works. It’s especially safe since I’m just sorting to another folder rather than automatcially discarding matches.
The other filter patterns were rejected for these false matches & abbreviations (some of which are specific to applications I use):
bq subquery
cf sendmail.cf
cg centigram
cv curriculum vitae
cx cx600 (emc)
fx special fx
gq gentlemans quarterly
jc jesus christ
jd jack daniels
jj hajj
jp japan
jv junior varsity
kq blockquote
mg milligram
mx mx missle
pv pvcs
qa qatar
qb quarter-back
ql sql
qt quick time
qw qwik (Sun Micro “Qwik” Seminars)
tx texas
vb visual basic
vc pvcs
vd veneral disease
vg volume group (veritas)
vl servlet
vm volume manager (veritas)
vp vpn
vt vermont
vw volkswagon
vx veritas
wv west virginia
xx strikeout,xxx
zb buzzbomb
zm hazmat