Monthly Archives: August 2008

An A to Z of confusion

A few days ago I blogged about my paper on email spam volumes — comparing “aardvarks” (email local parts [left of the @] beginning with “A”) with “zebras” (those starting with a “Z”).

I observed that provided one considered “real” aardvarks and zebras — addresses that received good email amongst the spam — then aardvarks got 35% spam and zebras a mere 20%.

This has been widely picked up, first in the Guardian, and later in many other papers as well (even in Danish). However, many of these articles have got hold of the wrong end of the stick. So besides mentioning A and Z, it looks as if I should have published this figure from the paper as well…

Figure 3 from the academic paper

… the point being that the effect I am describing has little to do with Z being at the end of the alphabet, and A at the front, but seems to be connected to the relative rarity of zebras.

As you can see from the figure, marmosets and pelicans get around 42% spam (M and P being popular letters for people’s names) and quaggas 21% (there are very few Quentins, just as there are very few Zacks).

There are some outliers in the figure: for example “3” relates to spammers failing to parse HTML properly and ending up with “3c” (a < character) at the start of names. However, it isn’t immediately apparent why “unicorns” get quite so much spam, it may just be a quirk of the way that I have assessed “realness”. Doubtless some future research will be able to explain this more fully.

Zebras and Aardvarks

We all know that different people get different amounts of email “spam“. Some of these differences result from how careful people have been in hiding their address from the spammers — putting it en claire on a webpage will definitely improve your chances of receiving unsolicited email.

However, it turns out there’s other effects as well. In a paper I presented last week to the Fifth Conference on Email and Anti-Spam (CEAS 2008), I showed that the first letter of the local part of the email address also plays a part.

Incoming email to Demon Internet where the email address local part (the bit left of the @) begins with “A” (think of these as aardvarks) is almost exactly 50% spam and 50% non-spam. However, where the local part begins with “Z” (zebras) then it is about 75% spam.

However, if one only considers “real” aardvarks and zebras, viz: where a particular email address was legitimate enough to receive some non-spam email, then the picture changes. If one treats an email address as “real” if there’s one non-spam email on average every second day, then real aardvarks receive 35% spam, but real zebras receive only 20% spam.

The most likely reason for these results is the prevalence of “dictionary” or “Rumpelstiltskin” attacks (where spammers guess addresses). If there are not many other zebras, then guessing zebra names is less likely.

Aardvarks should consider changing species — or asking their favourite email filter designer to think about how this unexpected empirical result can be leveraged into blocking more of their unwanted email.

[[[ ** Note that these percentages are way down from general spam rates because Demon rejects out of hand email from sites listed in the PBL (which are not expected to send email) and greylists email from sites in the ZEN list. This reduces overall volumes considerably — so YMMV! ]]]

An insecurity in OpenID, not many dead

Back in May it was realised that, thanks to an ill-advised change to some random number generation code, for over 18 months Debian systems had been generating crypto keys chosen from a set of 32,768 possibilities, rather than from billions and billions. Initial interest centred around the weakness of SSH keys, but in practice lots of different applications were at risk (see long list here).

In particular, SSL certificates (as used to identify https websites) might contain one of these weak keys — and so it would be possible for an attacker to successfully impersonate a secure website. Of course the attacker would need to persuade you to mistakenly visit their site — but it just so happens that one of the more devastating attacks on DNS has recently been discovered; so that’s not as unlikely as it must have seemed back in May.

Anyway, my old friend Ben Laurie (who is with Google these days) and I have been trawling the Internet to determine how many certificates there are containing these weak keys — and there’s a lot: around 1.5% of the certs we’ve examined.

But more of that another day! because earlier this week, Ben spotted that one of the weak certs was for Sun’s “OpenID” website, and that two more OpenID sites were weak as well (by weak we mean that a database lookup could reveal the private key!)

OpenID, for those who are unfamiliar with it, is a scheme for allowing you to prove your identity to site A (viz: provide your user name and password) and then use that identity on site B. There’s a queue of people offering the first bit, but rather less offering the second : because it means you rely on someone else’s due diligence in knowing who their users are — where “who” is a hard sort of thing to get your head around in an online environment.

The problem that Ben and I have identified (advisory here), is that an attacker can poison a DNS cache so it serves up the wrong IP address for Then, even if the victim is really cautious and uses https and checks the cert, their credentials can be phished. Thereafter, anyone who trusts Sun as an identity provider could be very disappointed. There’s other attacks as well, but you’ve probably got the general idea by now.

In principle Sun should make a replacement certificate and that should be it (and so they have — read Robin Wilton’s comments here). Except that they need to put the old certificate onto a Certificate Revocation List (CRL) because otherwise it will still be trusted from now until it expires (a fair while off). Sadly, many web browsers, and most of the OpenID codebases haven’t bothered with CRLs (or they don’t enable their checking by default so it’s as if it wasn’t there for most users).

One has to conclude that Sun (and the other two providers) should not be trusted by anyone for quite a while to come. But does that matter ? Since OpenID didn’t promise all that much anyway, does a serious flaw (which does require a certain amount of work to construct an attack) make any difference? At present this looks like the modern equivalent of a small earthquake in Chile.

Additional: Sun’s PR department tell me that the dud certificate has indeed been revoked with Verisign and placed onto the CRL. Hence any system that checks the CRL cannot now be fooled.

Listening to the evidence

Last week the House of Commons Culture, Media and Sport Select Committee published a report of their inquiry into “Harmful content on the Internet and in video games“. They make a number of recommendations including a self-regulatory body to set rules for Internet companies to force them to protect users; that sites should provide a “watershed” so that grown-up material cannot be viewed before 9pm; that YouTube should screen material for forbidden content; that “suicide websites” should be blocked; that ISPs should be forced to block child sexual abuse image websites whatever the cost, and that blocking of bad content was generally desirable.

You will discern a certain amount of enthusiasm for blocking, and for a “something must be done” approach. However, in coming to their conclusions, they do not, in my view, seem to have listened too hard to the evidence, or sought out expertise elsewhere in the world…
Continue reading Listening to the evidence

Card Wars: The Phantom Menace

Just like George Lucas can’t help but return to his old projects, I have been returning to mine. After three years of stagnation, I am pleased to announce the re-launch of, freshly re-vamped, updated and turned into a wiki editable by the general public.

In fact, it’s not just great artists like Mr. Lucas and I starting up old projects, our honourable colleagues wearing the black hats have got the same idea. We have new victims reporting in, rumours abound of an auth system compromise at Citi, the Ombudsman is backlogged with months of disputed withdrawal cases, and some like Alain Job are even going to court.

One original contributor to the phantom case histories has just been hit by a second phantom withdrawal five years on and is chalking up another case in the files. While her new phantom is a bread-and-butter skim incident (a magstripe clone used in the far east), amongst this mass, true phantoms — the real mystery cases — are on the rise too. Two new victims with whom I have been corresponding very kindly offered to fund the hosting for the revamped site.

Let’s consider one of these mysteries. The McGaughey case has been reported in the media in Northern Ireland: dozens of withdrawals taking place over four weeks, totaling almost five thousand pounds, all within a ten mile radius of the McGaughey’s home. Summarised that way it looks like a classic first party fraud (couple short on cash withdraw money, then deny it later). But no-one in the family is short on cash, the McGaugheys look after their card details carefully, and have solid alibis at the time of many of the withdrawals, and the interlocking pattern of real and disputed withdrawals is such that any third party would have a hard time taking and returning the card (whether covertly or in collusion with the McGaugheys). No-one appears to have either the means or the motive.

Unusually the bank has been very cooperative, providing logs from their authorisation system (BASE24), including all of the cryptograms, input data and transaction parameters covering the affected transactions. Everything turns on the Application Transaction Counter (ATC), an on-card counter which increments with every transaction initiated. If an EMV chip can be fully cloned (secret keys and all), then it will have to submit an ATC value when transacting, and if used in parallel with the real card, it won’t be long before the same number pops up twice in the auth system, or large gaps in the sequence appear. The McGaughey’s ATC sequence appears to interlock perfectly: clearly the original card was used?

Of course logs can be misinterpreted (Badger) or even faked, auth systems may not work as expected, and customers may lie and cheat following all sorts of agendas; just around the corner the missing piece of the jigsaw may lie, which reveals the truth behind the case. And there is the totally separate matter of who should suffer the loss in the interim, whilst the truth remains unclear. Liability for disputed withdrawals is the most hotly contested issue of all. can’t do much more for the McGaugheys, but it can bear witness. Documenting the incidence of phantoms and the experiences of customers disputing them adds much needed transparency to the process, and helps researchers and experts seek out the really interesting cases.

Maybe we can lift the lid and discover the truth behind the “phantom menace” — everyone is united in that goal at least — but let’s also hope that Episode 2: Attack of the Clones has not yet started shooting!