I’ve recently been analysing the incoming email traffic data for Demon Internet, a large(ish) UK ISP, for the first four weeks of March 2007. The raw totals show a very interesting picture:
The top four lines are the amount of incoming email that was detected as “spam” by the Cloudmark technology that Demon now uses. The values lie in a range of 5 to 13 million items per day, with the day of the week being irrelevant, and huge swings from day to day. See how 5 million items on Saturday 18th is followed by 13 million items on Monday 20th!
The bottom four lines are the amount of incoming email that was not detected as spam (and it also excludes incoming items with a “null” sender, which will be bounces, almost certainly all “backscatter” from remote sites “bouncing” spam with forged senders). The values here are between about 2 and 4 million items a day, with a clear pattern being followed from week to week, with lower values at the weekends.
There’s an interesting rise in non-spam email on Tuesday 27th, which corresponds to a new type of “pump and dump” spam (mainly in German) which clearly wasn’t immediately spotted as spam. By the next day, things were back to normal.
The figures and patterns are interesting in themselves, but they show how summarising an average spam value (it was in fact 73%) hides a much more complex picture.
The picture is also hiding a deeper truth. There’s no “law of large numbers” operating here. That is to say, the incoming spam is not composed of lots of individual spam gangs, each doing their own thing and thereby generating a fairly steady amount of spam from day to day. Instead, it is clear that very significant volumes of spam is being sent by a very small number of gangs, so that as they switch their destinations around: today it’s
.uk, tomorrow it’s
aol.com and on Tuesday it will be
.de (hmm, perhaps that’s why they hit
.demon addresses? a missing
$ from their regular expression!).
If there’s only a few large gangs operating — and other people are detecting these huge swings of activity as well — then that’s very significant for public policy. One can have sympathy for police officers and regulators faced with the prospect of dealing with hundreds or thousands of spammers; dealing with them all would take many (rather boring and frustrating) lifetimes. But if there are, say, five, big gangs at most — well that’s suddenly looking like a tractable problem.
Spam is costing us [allegedly] billions (and is a growing problem for the developing world), so there’s all sorts of economic and diplomatic reasons for tackling it. So tell your local spam law enforcement officials to have a look at the graph of Demon Internet’s traffic. It tells them that trying to do something about the spammers currently makes a lot of sense — and that by just tracking down a handful of people, they will be capable of making a real difference!