[Asrg] Re: Asrg Digest, DNSBL BCP v.2.0

gep2 at terabites.com gep2 at terabites.com
Sun Mar 4 13:46:51 EST 2007


>> I think you can do FAR BETTER from a content standpoint
> (content analysis, such as Spam Assassin, following "a
> priori" blocking of mail from unknown/untrusted senders
> containing HTML or attachments) than you can using any
> kind of IP-based blacklisting or other "reputation"
> scheme.

> SpamAssassin (and other content filters) don't actually work the way you think they do, on many levels.

> The major measurable component of spam is whether or not the sender
has permission to contact the recipient.

I disagree.  I have no objection at all to being contacted 
by someone I've never met before.  I hand out business 
cards at trade shows and elsewhere.  I have my E-mail 
address on my (well-indexed) personal Web site.  Many 
companies put their E-mail address on Yellow Pages ads and 
other public places.  The fact that I've not previously 
authorized contact isn't the problem.  The problem is the 
delivery of unwanted, highly repetitive and annoying, 
scams and garbage.

> Content filters in particular
have no view into consent; no way to measure consent.

In the absence of a fine-grained whitelist (on a 
per-sender basis), I agree that they don't have any way to 
measure consent.  But adding that component changes that 
situation quite dramatically.  Not only does the system 
then know WHO has established "consent", but also WHAT has 
been "consented" to.

The trick then is deciding which of the NEW, first-time 
contacts is likely to be unwanted.  Certainly, there are 
various clues... including the presence of content 
commonly used to evade filtering (decryption scripting, 
obscured URLs, URL redirection, etc etc).

> There is no HTML
code or X-Header that reliably provides proof of opt-in.

If there were, and if it could be relatively easily 
spoofed, it would be less than terribly useful.

I think the solution involves (among other components) a 
tacit understanding at both ends of the communication 
between what the sender is sending, and what the recipient 
expects them to send.  The bar should be higher for 
previously unknown, therefore unestablished or untrusted, 
senders.

> They do some
good things based on modeling of what looks like spam; but 
it's also
true that things that look like spam are not always spam.

That's true.  And that's where the recognition of a 
familiar (to the recipient) sender enters into things.

> False positive issues you rant about occur just as often with content filters. Some would claim, even more so!

I am likely to be far less upset about getting 
questionable mail if there is at least SOME arguable 
reason why the filter ought to have delivered it.  Users 
ought to be able to tweak their filters so that they can 
change the rules whenever they desire, especially for 
particular cases that occur with some frequency for them.

> With a blacklisting, I get a bounce back and can find somebody to argue with. With the common method of implementing a content filter,
my mail is quietly eaten and I get no information back 
regarding the
failure to deliver the mail to end recipient. This is 
worse than IP
blacklisting; less transparent; less obvious; less 
opportunity for
feedback and investigative recourse.

The big problem with blacklisting bouncebacks is that in 
the general case, you cannot be sure WHO to send the 
bounceback TO.  Once spam has gone through one or more 
levels of forwarding, the only way to go further back is 
via the Received: headers, but those are commonly 
counterfeit.  Sending bouncebacks multiplies the wasted 
bandwidth due to spam.

Worse, "intentional bounceback" can be used by spammer as 
one way to get their spam delivered to a third party... 
they send mail in a way that they are confident will be 
bounced back, but arrange things so that the bounceback 
will go to the actually intended recipient... but this 
time, the (bounceback) message is originating from a 
not-blacklisted MTA.

Ultimately, I believe that the best way to deal with such 
spam is to at least OFFER recipients a chance to review 
blocked messages (and hopefully via rules that they can 
use to eliminate the necessity of their reviewing 
repetitively familiar spam), or the choice of accepting 
the system's determination and just junking it.

But again, I'm far more willing to accept mail from 
someone if (1) I recognize the name of the sender, and (2) 
the mail "looks like" the sort of mail I would expect to 
receive from that sender.

> The fact that you think they're better is likely based on an incomplete view on your part. 

I doubt it, but I'm certainly willing to learn.

> You actually probably have no idea how much of your mail has ever been redirected to a bulk or trash folder
by a content filter.

Actually, I tend to monitor that rather closely, in part 
because I use that knowledge to refine my ruleset.

> And of course, not to mention that SpamAssassin, which you hold up as
the better model,

I consider it a 'respectable' example of the genre.  I 
generally make it a point to include "like" or "-type" in 
references to that product.

> has lovingly crafted hooks into it to allow direct
support of IP-based blacklist and other IP-based 
reputation
mechanisms.

Hopefully they use that as an INPUT into the rating 
process;  I don't have a problem with that, as long as 
mail coming from such "blacklisted" IP addresses is not 
BLINDLY trashed regardless of any other considerations.

> Note to rest of world: I'm not anti-SpamAssassin. I've run it myself
before and likely will again. I'm just pointing out that 
like just
about every other kind of spam filtering or blocking 
mechanism, a
content filter is imperfect.

Certainly they have limitations, including some which are 
so severe as to be essentially crippling.  HTML, embedded 
images, attachments, and the like make it nearly 
impossible for content-based spam filters to do a good and 
effective job.  Even if, (IF!) for example, a content 
filter had OCR abilities to try to analyze 
text-as-image... an embedded image could change the 
referenced image (say) an hour after sending the E-mail, 
such that it was actually read AFTER the analysis had 
passed the (previously linked) image.

> It's a bit mind-blowing to see content
filtering held up as this panacea to address the ills of 
IP-based
blocking, since they're both approximate models of what 
somebody
thinks is spam,

The only opinion that MATTERS is that of the recipient... 
which is why they should be able to control the ruleset 
and the sender-by-sender whitelist, as well as what to do 
with spam (e.g. putting it into a spam folder that they 
can examine as they wish to confirm the accuracy of the 
filtering).

> ...and have flaws inherent to both technology and policy
limitations.

Again, what the USERS want is the ability to have the mail 
that makes it into their inboxes bear some approximation 
to the mail they expect and want to see.  And only those 
users are able to make that judgement call, in the end 
analysis.  What we need is an effective, practical tool 
(or toolset) to allow them to express that set of 
criteria.

It's clearly not enough to look just at the e-mail 
headers; but within that limitation (for example) I was 
getting relatively useful filtering using the web-based 
ruleset offered by my domain provider, until I ran into 
their limit of 200 rules...!

> Regards,
Al Iverson

Gordon Peterson
http://personal.terabites.com
1977-2007  Thirty year anniversary of local area 
networking



More information about the Asrg mailing list