Zebras and Aardvarks – Light Blue Touchpaper

We all know that different people get different amounts of email “spam“. Some of these differences result from how careful people have been in hiding their address from the spammers — putting it en claire on a webpage will definitely improve your chances of receiving unsolicited email.

However, it turns out there’s other effects as well. In a paper I presented last week to the Fifth Conference on Email and Anti-Spam (CEAS 2008), I showed that the first letter of the local part of the email address also plays a part.

Incoming email to Demon Internet where the email address local part (the bit left of the @) begins with “A” (think of these as aardvarks) is almost exactly 50% spam and 50% non-spam. However, where the local part begins with “Z” (zebras) then it is about 75% spam.

However, if one only considers “real” aardvarks and zebras, viz: where a particular email address was legitimate enough to receive some non-spam email, then the picture changes. If one treats an email address as “real” if there’s one non-spam email on average every second day, then real aardvarks receive 35% spam, but real zebras receive only 20% spam.

The most likely reason for these results is the prevalence of “dictionary” or “Rumpelstiltskin” attacks (where spammers guess addresses). If there are not many other zebras, then guessing zebra names is less likely.

Aardvarks should consider changing species — or asking their favourite email filter designer to think about how this unexpected empirical result can be leveraged into blocking more of their unwanted email.

[[[ ** Note that these percentages are way down from general spam rates because Demon rejects out of hand email from sites listed in the PBL (which are not expected to send email) and greylists email from sites in the ZEN list. This reduces overall volumes considerably — so YMMV! ]]]