Beware of cybercrime data memes

Last year when I wrote a paper about mitigating malware I needed some figures on the percent of machines infected with malware. There are a range of figures, mainly below 10%, but one of the highest was 25%.

I looked into why this occurred and wrote it up in footnote #9 (yes, it’s a paper with a lot of footnotes!). My explanation was:

The 2008 OECD report on Malware [14] contained the sentence “Furthermore, it is estimated that 59 million users in the US have spyware or other types of malware on their computers.” News outlets picked up on this, e.g. The Sydney Morning Herald [20] who divided the 59 million figure into the US population, and then concluded that around a quarter of US computers were infected (assuming that each person owned one computer). The OECD published a correction in the online copy of the report a few days later. They were actually quoting PEW Internet research on adware/spyware (which is a subtly different threat) from 2005 (which was a while earlier than 2008). The sentence should have read “After hearing descriptions of ‘spyware’ and ‘adware’, 43% of internet users, or about 59 million American adults, say they have had one of these programs on their home computer.” Of such errors in understanding the meaning of data is misinformation made.

We may be about to have a similar thing happen with Facebook account compromises.

On Jan 4, ZoneAlarm published a blog article along with this graphic (I’ve provided a local copy because I hope that all other copies will get destroyed!). One of its key findings was:

  • 4 million Facebook users experience spam on a daily basis.
  • More than 20% of newsfeed links currently open viruses.
  • 600,000 logins are compromised every day. That’s 7 logins every second.

The graphic now says:

  • 4 million Facebook users experience spam on a daily basis.
  • 20% of Facebook users have been exposed to a virus.
  • Facebook sees 600,000 attempts to hijack logins a day and pre-emptively protects against them.

which, you have to agree is really rather different.

This blog article is sceptical, but not (entirely) corrected — I quote it because it mentions the PR reasons behind Zonealarm’s statistics (they sell a product which purportedly protects you), and because it mentions that other people had been confused about the 600,000 figure in the past.

So I looked into where the 600,000 figure originated, and found that it’s original source was Facebook!

This post by Graham Cluley at Sophos draws attention to Facebook’s graphic (original copy here) accompanying an Oct 27 2011 article about their security mechanisms which said:

  • Less than 4% of content shared on Facebook is spam.
  • Only .06% of over 1 billion logins per day are compromised.
  • Less than .5% of Facebook users experience spam on any given day.

Graham did the simple multiplication required to produce the 600,000 compromise figure, which is the same sum as Zonealarm’s PR people have done. Similarly the “less than .5%” translates to the 4 million figure they use.

However, if you look at the official Facebook copy of the infographic accompanying their blog post today (copy here) then you can see they have revised it. It now just has the data points:

  • Less than 4% of content shared on Facebook is spam.
  • Less than .5% of Facebook users experience spam on any given day.

In fact they revised their report pretty much immediately after they first posted it, when journalists started ringing! In this article on the topic Facebook is quoted as saying that the 600,000 is a count of logins that are blocked because Facebook is not convinced it is the account owner who is doing the login — so if some criminal tries a brute force guessing attack on 850 accounts, getting around to each one every 2 minutes, they alone would create the 600,000/day figure!

Time will tell whether the original meme survives, but perhaps people searching for a source to cite will encounter this blog post (or indeed this one which looks at the spam data) and avoid promulgating misleading data the way that Zonealarm has done.

PS: So far I cannot source the Zonealarm “20% of newsfeeds figure” to see how that came about, but I’m keeping looking.

2 thoughts on “Beware of cybercrime data memes

  1. Phrasing about infection rates can have a dramatic impact on the statistics reported. Thinking about the “20% of newsfeeds” figure reminded me of our recent study into the abuse of trending terms on web search results (http://cs.wellesley.edu/~tmoore/ccs11.pdf).

    Of the 9.8 million distinct trending search results we collected over many months, only 7,889 were flagged as distributing malware by Google Safe Browsing API. That’s 0.08% of results, which doesn’t sound too bad. If, however, you instead measure the percentage of trending terms that include at least one infected search result at any given time (among the top 32 results which we collected), we arrive at a more alarming 4.4% of terms.

    My guess is that a similar issue is at play in calculating the percentage of news feeds figure. It may be that 20% of the newsfeeds ZoneAlarm inspected include at least one link somewhere that points to a “virus”. I find it far less plausible that they found 20% of all newsfeed links to point to a virus. The only way I could see that happening is if ZoneAlarm happened to use a very unrepresentative sample of Facebook friends.

  2. On first look I read the subject as “Maintaining malware”.

    Which must be a problem, and perhaps a topic of some slight interest.

    I suppose we would like to make it as hard as possible to maintain malware, and the vertical integrators of malware would be tending to develop it so that it steadily becomes more easy to maintain.

    Is there any7 visible tendency in those directions yet?

Leave a Reply

Your email address will not be published. Required fields are marked *