[Asrg] differential confidence
Michael Thomas
mike at mtcc.com
Mon Dec 8 17:01:14 PST 2008
Dave CROCKER wrote:
> The fact that someone hit TIS means that -- independent of whether
> it is actually something that might be called spam -- the message
> irritated the user. Every single TIS click has a 100% confidence
> factor, in terms of being a valid count of being problematic to the
> end-user. (I'll quickly acknowledge that we have a derivative issue
> from the fact that a given user is inconsistent and what is irritating
> to me this morning might not be irritating this afternoon; but we have
> plenty to consider by just looking at first-order issues.)
I think as was already pointed out, 100% is not something that
humans do well in any context. That is, mistakes happen. And it's
probably worth pointing out again that TIS and actual spam are
only loosely correlated. People who complain about the
human evaluators (not you) being imperfect spam judges are
missing the larger point of what _they_ see that button as.
> In contrast, perhaps you take the 'good' number from something like
> "no one complained". There can be lots of reasons no one complained,
> only some of which are due to a message's being "good". So our
> confidence in the aggregate measure of goodness needs to be much less
> than 100%.
>
> So, how do we factor in differential confidence levels in the final
> assessment?
I've often wondered whether you could do something with timing
the longevity of things in people's inboxes as first order
approximation of "value". For example: if the timing between me
seeing a piece of mail, and me hitting the delete button is
consistently short, it's probably an indication that I'm not
very interested in it. Maybe not enough to killfile them, but
it probably would yield some clues as to how *I* prioritize some
traffic over other traffic.
As you allude to, this is clearly a dynamic system too: my interest
in some topics is situational, and clearly changes over time. It
is also context based: even actors that I rarely read may be
contributing to a subject that I'm very interested, etc, etc.
What this really points to, IMO, is that the entire way that mail
-- and the many other emerging or established media -- are presented,
prioritized, alerted, etc are pretty well borked. Spam is just one
small -- but important -- part of that problem. But even if spam were
solved through a divine act tomorrow, it would not address the ever
increasing fire hose that we're demanded to drink from.
Mike
More information about the Asrg
mailing list