[Asrg] where the message originated (was: DKIM role?) (SM)
Ian Eiloart
iane at sussex.ac.uk
Fri Jan 23 02:40:58 PST 2009
--On 22 January 2009 23:47:53 -0500 Rich Kulawiec <rsk at gsp.org> wrote:
> On Wed, Jan 21, 2009 at 10:48:08AM +0000, Ian Eiloart wrote:
>> I guess that depends on the nature of the RBL. Some of them really are
>> reputation systems. IP addresses get listed because someone has seen
>> spam coming from them. Spamhaus' SBL is an example. If you don't agree
>> that that's a reputation service, please explain.
>
> I don't agree that's a reputation service. It's a binary flag (on a
> per-IP basis) saying "this address has been observed (at one or more
> observation points) as sending spam". And that's all it says -- and it
> only says it about a subset of IP addresses (those seen emitting spam)
> and only about a subset of those (those seen by Spamhaus-affiliated
> sensors) and only about a subset of those (add some notion of "recently"
> -- an address that spewed in the past may not be listed now).
>
> That's highly useful information if my goal is to block spam, but it's
> only marginally useful if my goal is to do anything else.
>
> (BTW: there's only one RBL, and Spamhaus doesn't run it. They run
> DNSBLs.)
I stand corrected.
> Maybe I'm just quibbling over the definition of "reputation service",
> or maybe my definition isn't broad enough. But I don't think of DNSBLs
> or RHSBLs that way, yet. I'll mull it over.
OK, I see where you're coming from now. I guess we might agree that DNSBLs
collectively provide a minimalist reputation service.
I do think that DNSBLs speak to the reputation of some email emitters. In
that respect, I'd argue that even a single data point constitutes a comment
on reputation, and it it's available to me then it's providing me with a
service - even if the service is no good. However, even a single datapoint
could be a useful service if it stopped lots of spam for me.
Collectively, DNSBLs provide more than binary information about a single IP
address. For example, you could consult a dozen of them, and compare a
weighted sum of their responses against a threshold determined in your
local anti-spam policy.
I disagree that they say nothing about addresses that they don't list.
Silence is a comment, in this context. Provided that list policies apply to
all IP addresses equally, then they are commenting on all IP addresses.
However, if a DNSBL had a policy (explicit or implicit) that it would never
list a certain address range, then the service ceases to be comprehensive.
It doesn't cease to be a service, though.
>> However, currently it's hard to know what to whitelist. There's only one
>> widespread, easy to use mechanism for managing information about which
>> IP addresses an organisation is likely to send messages from - that's
>> SPF. OK, so if you wanted to be sure to get mail from me, you could
>> whitelist my /24 address block, but are you sure that I'd keep you
>> updated if we outsourced our email?
>
> But this is not one of my pressing concerns: oh, it's not totally
> off my radar, but it's far down on the priority scale. I'll try
> to explain below.
>
>> Yes, the problem of course is when a spammer forges a domain that I'd
>> like to trust. If I'm filtering mail from the domain of my chief
>> funders, then false positives can be really painful. If I whitelist
>> them, then spammers can easily bypass my filters. So what I'm
>> discussing IS all about forgery.
>
> If you are efficiently blocking spam, then this may be somewhat
> of a non-issue. (It pretty much is for me.) Let me illustrate
> with an example: traffic was presented a little while ago on
> port 25 from 123.140.212.144:
>
> I could have sanity-checked the HELO.
>
> I could have run it through SPF or similar, but didn't.
>
> I could have waited for the data phase and run the content
> through SpamAssassin and/or ClamAV, but didn't.
>
> I could have looked up rDNS to see if it existed, but didn't.
> (Or, having it looked it up and found it to exist, checked
> the domain against various RHSBLs. And/or for MX sanity.)
>
> I could have checked the IP address it against various DNSBLs.
>
> Instead, the mail system noted that it's in Korean IP space,
> which for that mail server is a 100% source of spam and a 0%
> source of mail. So it was immediately rejected.
>
> So maybe it was a phish with forged sender address at Paypal --
> and maybe I could have figured that out via one of the methods
> that have been discussed here. But my point is that I didn't
> need to, because I knew it was spam before getting that far.
>
> Repeat this for myriad variations -- use of various DNSBLs, use of
> the Spamhaus DROP list, various country allocations, thousands and
> thousands of spammer domains, dynamic/generic IP and name space,
> and so on. What I've found is that if I'm sufficiently aggressive
> about blocking spam sources (including pre-emptive blocks)
> that I don't need to worry so much about what's in the spam -- forged
> headers, bogus URLs, etc. -- because it never makes it to the point
> where I have concern myself with any of that.
>
> So my response to someone who says "I'm getting a lot of forged
> traffic claiming to be from Paypal and I need an anti-forgery
> method to figure that out" is "No, you need to be a lot more
> aggressive about blocking spam. AFTER you do that, you should
> re-assess, see if this is still actually a problem for you, and
> then, maybe, you might consider anti-forgery technology of one
> sort or another, or even ad hoc local checks for some specific
> cases, like maybe the credit union that serves your university."
>
> Does this kinda clarify where I'm coming from?
Yes, it does. However, I'm an admin for a University with students from
most countries in the world, and academics that work in most countries in
the world, and studying every topic under the sun - including SPAM! So,
it's very difficult for me to be that aggressive. I certainly block IP
addresses according to country allocation.
That's why I need more information about who the IP addresses belong to.
Without that information, and with the prevalance of sender address
forgery, the IP address is the only real information that I have about a
message before it's too late to reliably apply recipient specific filtering.
>>> I think perhaps I have this viewpoint because my focus is on my biggest
>>> (ongoing) problem: what to do about the 99% of incoming mail that needs
>>> to be rejected outright before it can get anywhere near a user.
>>
>> You mean you want to know how to identify it? Or what to do with it
>> after you've identified it?
>
> The former -- because my approach to the latter is "issue a reject,
> hang up, move on" in all cases. (Although I should mention in passing
> that DNS lookup failures get a 4XX 'cause maybe their DNS is hosed,
> maybe mine is hosed, maybe transport is fubar.)
>
> The former is tougher, because -- despite my aggressive approach
> to spam, I don't want to deal with a high FP rate. I've come to
> conclusion that one approach which seems to work boils down to
> "know your email": study traffic patterns, inbound and outbound.
> Every mail server (that I've ever looked at) has different
> characteristics, and if you can figure out what they are, then you can
> twiddle the knobs in very server-specific ways that minimize FN and FP at
> the same time. (See example above, which clearly would not work at
> all on some other servers -- say for a research university.)
>
> Of course, this takes time and patience -- but I'll argue that
> we should be trying to extract the most from the methods we already have
> (like the ones I tossed out above), that we understand fairly well,
> and that we know work on a large scale in production environments,
> before we try to invent and deploy new methods.
>
> ---Rsk
> _______________________________________________
> Asrg mailing list
> Asrg at irtf.org
> http://www.irtf.org/mailman/listinfo/asrg
--
Ian Eiloart
IT Services, University of Sussex
x3148
More information about the Asrg
mailing list