Archive for August 29th, 2008

Aug 29, '08

A few days ago I blogged about my paper on email spam volumes — comparing “aardvarks” (email local parts [left of the @] beginning with “A”) with “zebras” (those starting with a “Z”).

I observed that provided one considered “real” aardvarks and zebras — addresses that received good email amongst the spam — then aardvarks got 35% spam and zebras a mere 20%.

This has been widely picked up, first in the Guardian, and later in many other papers as well (even in Danish). However, many of these articles have got hold of the wrong end of the stick. So besides mentioning A and Z, it looks as if I should have published this figure from the paper as well…

Figure 3 from the academic paper

… the point being that the effect I am describing has little to do with Z being at the end of the alphabet, and A at the front, but seems to be connected to the relative rarity of zebras.

As you can see from the figure, marmosets and pelicans get around 42% spam (M and P being popular letters for people’s names) and quaggas 21% (there are very few Quentins, just as there are very few Zacks).

There are some outliers in the figure: for example “3″ relates to spammers failing to parse HTML properly and ending up with “3c” (a < character) at the start of names. However, it isn’t immediately apparent why “unicorns” get quite so much spam, it may just be a quirk of the way that I have assessed “realness”. Doubtless some future research will be able to explain this more fully.


Calendar

August 2008
M T W T F S S
« Jul   Sep »
 123
45678910
11121314151617
18192021222324
25262728293031

Posts by Month

Posts by Category