[Asrg] differential confidence

Michael Thomas mike at mtcc.com
Mon Dec 8 17:01:14 PST 2008


Dave CROCKER wrote:
>      The fact that someone hit TIS means that -- independent of whether 
> it is actually something that might be called spam -- the message 
> irritated the user.  Every single TIS click has a 100% confidence 
> factor, in terms of being a valid count of being problematic to the 
> end-user.  (I'll quickly acknowledge that we have a derivative issue 
> from the fact that a given user is inconsistent and what is irritating 
> to me this morning might not be irritating this afternoon; but we have 
> plenty to consider by just looking at first-order issues.)

   I think as was already pointed out, 100% is not something that
   humans do well in any context. That is, mistakes happen. And it's
   probably worth pointing out again that TIS and actual spam are
   only loosely correlated. People who complain about the
   human evaluators (not you) being imperfect spam judges are
   missing the larger point of what _they_ see that button as.

>      In contrast, perhaps you take the 'good' number from something like 
> "no one complained".  There can be lots of reasons no one complained, 
> only some of which are due to a message's being "good".  So our 
> confidence in the aggregate measure of goodness needs to be much less 
> than 100%.
> 
> So, how do we factor in differential confidence levels in the final 
> assessment?

   I've often wondered whether you could do something with timing
   the longevity of things in people's inboxes as first order
   approximation of "value". For example: if the timing between me
   seeing a piece of mail, and me hitting the delete button is
   consistently short, it's probably an indication that I'm not
   very interested in it. Maybe not enough to killfile them, but
   it probably would yield some clues as to how *I* prioritize some
   traffic over other traffic.

   As you allude to, this is clearly a dynamic system too: my interest
   in some topics is situational, and clearly changes over time. It
   is also context based: even actors that I rarely read may be
   contributing to a subject that I'm very interested, etc, etc.

   What this really points to, IMO, is that the entire way that mail
   -- and the many other emerging or established media -- are presented,
   prioritized, alerted, etc are pretty well borked. Spam is just one
   small -- but important -- part of that problem. But even if spam were
   solved through a divine act tomorrow, it would not address the ever
   increasing fire hose that we're demanded to drink from.

		Mike


More information about the Asrg mailing list