Richi'Blog
Stuff 'n' nonsense about email, spam, travel, and life in the UK.

Wednesday, April 18, 2007

More About the CEAS Spam Control Bake-off

Last week, I wrote about the CEAS 2007 Live Spam Challenge (CEAS is the Conference on Email and Anti-Spam). I opined that fair comparative testing of spam control technologies is extremely difficult, especially when behavioural analysis techniques such as greylisting and OS fingerprinting are part of the spam control technology mix.

I wanted to clarify that the test isn't intended to evaluate the relative strengths and weaknesses of existing spam control products (that would be extremely difficult to do fairly, as last week's post pointed out). The intention is to compare some promising new content-based filtering techniques -- techniques that might be employed as components in a cocktail of techniques used by a spam control product.

As Gordon Cormack, one of the test's co-organizers, wrote:

An open competition attracts all sorts of techniques that can be vetted. The methods that are uncompetitive can be discounted, and the "greatest hits" can be tested ... in combination with greylisting ... and other intrusive techniques.
...
One popular fallacy that I run into all the time is, "this test has limitations, so it shouldn't be done." All tests and experiments have limitations, and the scientific method involves identifying them and constructing specific experiments to see how much the limitations matter, not witholding all tests until the perfect one can be done (which, of course, it can never be).

Labels: , ,

Monday, April 09, 2007

CEAS Spam Filter Bakeoff

The fourth Conference on Email and Anti-Spam (CEAS) is planning a bakeoff this year. In the CEAS 2007 Live Spam Challenge, the organizers hope to simultaneously inject a live stream of spam and legitimate email into several spam filters over a 24 hour period.

However, fair comparative testing of spam control technologies is extremely difficult -- by some measures, it's impossible. Because some promising filter techniques rely on examining the real-time behaviour of the sending machine, it proves tricky to provide the exact same stream of email to all the filters at the same time.

For example, some filters attempt to "fingerprint" the sending machine's operating system -- the idea being that, say, a Windows 98 PC has no business submitting email direct-to-MX. In a test that replicates an inbound email stream to several servers, it's tricky to allow the receiving filters to send IP packets back to the true originating IP address in such a way that is fair and equitable for all test participants.

In its defense, CEAS recognizes this difficulty by excluding greylisting from the list of permitted techniques. I'll be watching this one with interest.

Labels: , ,

For more posts, go to the home page, or see the archive.