Daily Archives: 2008-08-29

An A to Z of confusion

A few days ago I blogged about my paper on email spam volumes — comparing “aardvarks” (email local parts [left of the @] beginning with “A”) with “zebras” (those starting with a “Z”).

I observed that provided one considered “real” aardvarks and zebras — addresses that received good email amongst the spam — then aardvarks got 35% spam and zebras a mere 20%.

This has been widely picked up, first in the Guardian, and later in many other papers as well (even in Danish). However, many of these articles have got hold of the wrong end of the stick. So besides mentioning A and Z, it looks as if I should have published this figure from the paper as well…

Figure 3 from the academic paper

… the point being that the effect I am describing has little to do with Z being at the end of the alphabet, and A at the front, but seems to be connected to the relative rarity of zebras.

As you can see from the figure, marmosets and pelicans get around 42% spam (M and P being popular letters for people’s names) and quaggas 21% (there are very few Quentins, just as there are very few Zacks).

There are some outliers in the figure: for example “3” relates to spammers failing to parse HTML properly and ending up with “3c” (a < character) at the start of names. However, it isn’t immediately apparent why “unicorns” get quite so much spam, it may just be a quirk of the way that I have assessed “realness”. Doubtless some future research will be able to explain this more fully.