All posts by Richard Clayton

A wrecking amendment ?

For the past few months the Digital Economy Bill (DEB) has been quietly making its way through the House of Lords. As is the way of these things, large numbers of amendments have been proposed, their lordships have had a series of mini-debates on each set of issues, and the Government have been busily amending the Bill in an attempt to fix all the things that they didn’t think through properly.

The main thrust of the DEB’s approach to dealing with unlawful file sharing of copyright material has been a “three strikes” policy. That is, should you be detected to be sharing some popular beat combo’s music without permission, then on the first two occasions you’d receive an admonishing letter, and on the third time then you would be subject to “technical measures” (ie: very slow Internet speeds) or disconnection, the latter doubtless annoying the rest of your family as they would be unable to visit DirectGov / keep up their social life / catch-up TV shows / do their homework / avoid being sacked from their work-from-home job!

However, the Government are concerned that this won’t be enough, and that unlawful sharing of copyright material might occur in new ways in future. So in clause 17 of the DEB they set out a scheme for amendment (in ways that would be decided as future circumstances required) of the Copyright, Designs and Patents Act 1988 through secondary legislation.

It is unusual to grant such open ended powers to amend primary legislation, because Parliament would be presented with an unamendable statutory instrument and invited to vote for it — no such SI has been defeated in the House of Lords since 2000, and the time before that was in 1968.

There was an outcry over the breadth of clause 17, and so the Government set out amendments to restrict it — but last week peers voted for an opposition amendment (120A) to have an alternative arrangement altogether, a regime of High Court injunctions that would force ISPs to block websites.

This is such a dumb (and dangerous) idea that it has all the characteristics of a wrecking amendment, added to the Bill just to eat up parliamentary time so that the whole Bill will fall at the dissolution for the upcoming election.

Continue reading A wrecking amendment ?

How hard can it be to measure phishing?

Last Friday I went to a workshop organised by the Oxford Internet Institute on “Mapping and Measuring Cybercrime“. The attendees represented many disciplines from lawyers, through ePolicy, to serving police officers and an ex Government minister. Much of the discussion related to the difficulty of saying precisely what is or is not “cybercrime“, and what might be meant by mapping or measuring it.

The position paper I submitted (one more of the extensive Moore/Clayton canon on phishing) took a step back (though of course we intend to be a step forward), in that it looked at the very rich datasets that we have for phishing and asked whether this meant that we could usefully map or measure that particular criminal speciality?

In practice, we believe, bias in the data and the bias of those who are interpret it means that considerable care is needed to understand what all the data actually means. We give an example from our own work of how failing to understand the bias meant that we initially misunderstood the data, and how various intentional distortions arise because of the self-interest of those who collect the data.

Extrapolating, this all means that getting better data on other types of cybercrime may not prove to be quite as useful as might initially be thought.

As ever, reading the whole paper (it’s only 4 sides!) is highly recommended, but to give a flavour of the problem we’re drawing attention to:

If a phishing gang host their webpages on a thousand fraudulent domains, using fifty stolen credit cards to purchase them from a dozen registrars, and then transfer money out of a hundred customer accounts leading to a monetary loss in six cases: is that a 1000 crimes, or 50, or 12, or 100 or 6 ?

The phishing website removal companies would say that there were 1000 incidents because they need to get 1000 domains suspended. The credit card companies would say there were 50 incidents because 50 cardholders ought to have money reimbursed. Equally they would have 12 registrars to “charge back” because they had accepted fraudulent registrations (there might have been any number of actual credit card money transfer events between 12 and 1000 depending whether the domains were purchased in bulk). The banks will doubtless see the criminality as 100 unauthorised transfers of money out of their customer accounts; but if they claw back almost all of the cash (because it remains within the mainstream banking system) then the six-monthly Financial Fraud Action UK (formerly APACS) report will merely include the monetary losses from the 6 successful thefts.

Clearly, what you count depends on who you are — but crucially, in a world where resources are deployed to meet measurement targets (and your job is at risk if you miss them), deciding what to measure will bias your decisions on what you actually do and hence how effective you are at defeating the criminals.

Mobile Internet access data retention (not!)

In the first article in this series I discussed why massive use of Network Address Translation (NAT) means that traceability for mobile Internet access requires the use of source port numbers. In the second article I explained how in practice the NAT logging records, that record the mapping from IP address to customer, are available for only a short time — or may not exist at all.

This might seem a little surprising because within the EU a “data retention” regime has been in place since the Spring of 2009. So surely the mobile phone companies have to keep the NAT records of Internet access, even though this will be horribly expensive?

They don’t!

The reason is that instead of the EU Directive (and hence UK and other European laws) saying what was to be achieved — “we want traceability to work” — the bureaucrats decided to say what they wanted done — “we want logs of IP address allocation to be kept”. For most ISPs the two requirements are equivalent. For the mobile companies, with their massive use of NAT, they are not equivalent at all.

The EU Directive (Article 5) requires an ISP to retain for all Internet access events (the mobile call itself will require other info to be retained):

(a)(i) the user ID(s) allocated;
(a)(iii) the name and address of the subscriber or registered user to whom an Internet Protocol (IP) address, user ID or telephone number was allocated at the time of the communication;
(c)(i) the date and time of the log-in and log-off of the Internet access service, based on a certain time zone, together with the IP address, whether dynamic or static, allocated by the Internet access service provider to a communication, and the user ID of the subscriber or registered user;
(e)(ii) the digital subscriber line (DSL) or other end point of the originator of the communication;

That is, the company must record which IP address was given to the user, but there is no requirement to record the source port number. As discussed in this series of articles, this makes traceability extremely problematic.

It’s also somewhat unclear (but then much more of the Directive is technically unclear) whether recording the “internal” IP address allocated to the user is sufficient, or whether the NAT records (without the port numbers) need to be kept as well. Fortunately, in the UK, the Regulations that implement the Directive make it clear that the rules only apply once a notice has been served on an ISP, and that notice must say to what extent the rules apply. So in principle, all should be clear to the mobile telcos!

By the way … this bureaucratic insistence on saying what is to be done, rather than what is to be achieved, can also be found in the Digital Economy Bill which is currently before the House of Lords. It keeps on mentioning “IP addresses” being required, with no mention of source port numbers.

But perhaps that particular problem will turn out OK? Apple will not let anyone with an iPhone download music without permission!

Practical mobile Internet access traceability

In an earlier article I explained how the mobile phone companies are using Network Address Translation on a massive scale to allow hundreds of Internet access customers to share a single IP address. I pointed out how it was now necessary to record the source port as well as the IP address if you wanted to track somebody down.

Having talked in detail about this with one of the UK’s major mobile phone companies, I can now further describe some practical issues (Caveat: other companies may differ in how they’ve implemented their details, but all of them are doing something very similar).

The good news, first, is that things are not as bad as they might be!

By design, NAT systems provide a constant mapping to a single IP address for any given user (at least until they next disconnect). This means that, for example, a website that is tracking visitors by IP address will not serve the wrong content; and their log analysis program will see constant IP addresses when the user changes page or fetches an image, so that audience measurements will remain valid. From a security point of view, it means that provided you have at least one logging record with IP address + port number + timestamp, then you will have sufficient data to be able to seek to make an identification.

As a quick aside, you may be thinking that you could do an “inference attack” to identify someone without using a source port number. Suppose that you can link several bad events together over a period of time, but only have the IP address of each. Despite the telco having several hundred people using each IP address at each relevant instant, only one user might be implicated on every occasion. Viewers of the wire will recall a similar scheme being used to identify Stringer Bell’s second SIM card number!

Although this inference approach would work fine in theory, the telco I spoke with does not keep its records in a suitable form for this to be at all efficient. So, even supposing that one could draft the appropriate legal request (a s22 notice, as prescribed by the UK’s Regulation of Investigatory Powers Act), the cost of doing the searches and collating the results (and those costs are borne by the investigators), would be prohibitive.

But now it’s time for the bad news.

Traditional ISP IP address usage records (in RADIUS or similar systems) have both a “start” and “stop” record. The consistency of these records within the logging system gives considerable assurance that the data is valid and complete. The NAT logging only records an event when the source port starts to be used — so if records go missing (and classical syslog systems can lose records when network errors occur), then there is no consistency check to show that the wrong account has been identified.

The logging records that show which customer was using which IP address (and source port) are extremely large — dozens of records can be generated by viewing just one web page. They also provide sensitive information about customer habits, and so if they are retained at all, it will only be for a short period of time. This means that if you want traceability then you need to get a move on. ISPs typically keep logs of IP address usage for a few weeks. At the mobile companies (because of the volume) you will in practice have to consult the records within just a few days.

Furthermore, even when the company intends to hold the data for a short period, it turns out that under heavy load the NAT equipment struggles to do what it’s supposed to be doing, leave alone generate gigabytes of logging. So the logging is often turned off for long periods for service protection reasons.

Clearly there’s a reputational risk to not having any records at all. For an example which does not have anything to do with policing: not being able to track down the sources of email spam would demean the mobile company in the eyes of other ISPs (which in practice will be seen by ever more aggressive filtering of all of their email). However, that risk is rather long-term; keeping the system running “now” is rather more important; and there is a lot that a mobile company can do to block and detect spam within their own networks — they don’t need to rely on being able to process external abuse reports.

In the third and final article of this little series I will consider the question of “data retention”. Surely the mobile phone company has a legal duty to keep traceability records? It turns out that the regulators screwed it up — and they don’t!

Extending the requirements for traceability

A large chunk of my 2005 PhD thesis explained how “traceability” works; how we can attempt to establish “who did that?” on the Internet.

The basics are that you record an IP address and a timestamp; use the Regional Internet Registry records (RIPE, ARIN etc) to determine which ISP has been allocated the IP address; and then ask the ISP to use their internal records to determine which customer account was allocated the IP address at the relevant instant. All very simple in concept, but hung about — as the thesis explained — by considerable caveats as to whether the simple assumptions involved are actually true in a particular case.

One of the caveats concerned the use of Network Address Translation (NAT), whereby the IP addresses used by internal machines are mapped back and forth to external IP addresses that are visible on the global Internet. The most familiar NAT arrangement is that used by a great many home broadband users, who have one externally facing IP address, yet run multiple machines within the household.

Companies also use NAT. If they own sufficient IP addresses they may map one-to-one between internal and external addresses (usually for security reasons), or they may only have 4 or 8 external IP addresses, and will use some or all of them in parallel for dozens of internal machines.

Where NAT is in use, as my thesis explained, traceability becomes problematic because it is rare for the NAT equipment to generate logs to record the internal/external mapping, and even rarer for those logs to be preserved for any length of time. Without these logs, it is impossible to work out which internal user was responsible for the event being traced. However, in practice, all is not lost because law enforcement is usually able to use other clues to tell them which member of the household, or which employee, they wish to interview first.

Treating NAT with this degree of equanimity is no longer possible, and that’s because of the way in which the mobile telephone companies are providing Internet access.

The shortage of IPv4 addresses has meant that the mobile telcos have not been able to obtain huge blocks of address space to dish out one IP address per connected customer — the way in which ISPs have always worked. Instead, they are using relatively small address blocks and a NAT system, so that the same IP address is being simultaneously used by a large number of customers; often hundreds at a time.

This means that the only way in which they can offer a traceability service is if they are provided with an IP address and a timestamp AND ALSO with the TCP (or UDP) source port number. Without that source port value, the mobile firm can only narrow down the account being used to the extent that it must be one out of several hundred — and since those several hundred will have nothing in common, apart from their choice of phone company, law enforcement (or anyone else who cares) will be unable to go much further.

So, the lesson here is clear — if you are creating logs of activity for security purposes — because you might want to use the information to track someone down — then you must record not only the IP address, but also the source port number.

This will soon be a necessity not just for connections from mobile companies, but for many other Internet access providers as well — because of the expected rise of “Carrier Grade NAT” (or “Large Scale NAT“), as one way of staving off the advent of the different sort of world we will enter when we “run out” of IPv4 addresses — sometime in the next two or three years.

There is currently an “Internet Draft” (a document that might become an RFC one day) that sets out a number of other issues which arise when addresses are routinely shared … though the authors appear unaware that this isn’t something to worry about in 2011, but something that has already been happening for some time, and at considerable “scale”.

In my next article, I’ll discuss what this massive use of NAT means in practice when traceability leads you to a mobile phone user.

When is a leak not a leak ?

WikiLeaks have decided to save other people some bandwidth and make some of my powerpoint slides available on their site. Since they usually publish censored or confidential information, they’re presumably completely unaware of how these slides have been available to the public from two different websites since the day of the talk.

Remarkably similar slides (I’m often lazy!) were also presented in talks I have given this year to INEX (the Irish Internet Exchange Point) [slides here], to EuroBSDCon [slides here] and the BCS (Herts branch) [slides here].

These talks have been covered various technical aspects of the blocking of child sexual abuse images for sites that appear on the IWF list. I’ve been mentioning the blocking of Wikipedia just over a year ago, and the blocking of archive.org up to last February. However, I’ve also thrown in a couple of slides about some more recent research, yet to be published, which explores a different way of determining what is on the IWF list. That seems to have been what has interested WikiLeaks.
Continue reading When is a leak not a leak ?

What does Detica detect?

There has been considerable interest in a recent announcement by Detica of “CView” which their press release claims is “a powerful tool to measure copyright infringement on the internet”. The press release continues by saying that it will provide “a measure of the total volume of unauthorised file sharing”.

Commentators have divided as to whether these claims are nonsense, or whether the system must be deeply intrusive. The main reason for this is that when peer-to-peer file sharing flows are encrypted, it is impossible for a passive observer to know what is being transferred.

I met with Detica last Friday, at their suggestion, to discuss what their system actually did (they’ve read some of my work on Phorm’s system, so meeting me was probably not entirely random). With their permission, I can now explain the basics of what they are actually doing. A more detailed account should appear at some later date.
Continue reading What does Detica detect?

RIP memes

There was a discussion a little while back on the UKCrypto mailing list about how the UK Regulation of Investigatory Powers Act came to be so specifically associated in the media with terrorism, when it is far more general than that ( see for example: “Anti-terrorism laws used to spy on noisy children” ).

I suggested that this “meme” might well be traced back to the Home Office website’s quick overview text which used to say (presumably before they thought better of it):

The Regulation of Investigatory Powers Act (RIPA) legislates for using various methods of surveillance and information gathering for the prevention of crime including terrorism.

Well, I’ve just noticed another source of memes (which may be new, since Google are continually experimenting with their system. or which may have been there for simply ages, unnoticed by me at least).
Continue reading RIP memes

apComms backs ISP cleanup activity

The All Party Parliamentary Communications Group (apComms) recently published their report into an inquiry entitled “Can we keep our hands off the net?”

They looked at a number of issues, from “network neutrality” to how best to deal with child sexual abuse images. Read the report for the all the details; in this post I’m just going to draw attention to one of the most interesting, and timely, recommendations:

51. We recommend that UK ISPs, through Ofcom, ISPA or another appropriate
organisation, immediately start the process of agreeing a voluntary code for
detection of, and effective dealing with, malware infected machines in the UK.
52. If this voluntary approach fails to yield results in a timely manner, then we further recommend that Ofcom unilaterally create such a code, and impose it upon the UK ISP industry on a statutory basis.

The problem is that although ISPs are pretty good these days at dealing with incoming badness (spam, DDoS attacks etc) they can be rather reluctant to deal with customers who are malware infected, and sending spam, DDoS attacks etc to other parts of the world.

From a “security economics” point of view this isn’t too surprising (as I and colleagues pointed out in a report to ENISA). Customers demand effective anti-spam, or they leave for another ISP. But talking to customers and holding their hand through a malware infection is expensive for the ISP, and customers may just leave if hassled, so the ISPs have limited incentives to take any action.

When markets fail to solve problems, then you regulate… and what apComms is recommending is that a self-regulatory solution be given a chance to work. We shall have to see whether the ISPs seize this chance, or if compulsion will be required.

This UK-focussed recommendation is not taking place in isolation, there’s been activity all over the world in the past few weeks — in Australia the ISPs are consulting on a Voluntary Code of Practice for Industry Self-regulation in the Area of e-Security, in the Netherlands the main ISPs have signed an “Anti-Botnet Treaty“, and in the US the main cable provider, Comcast, has announced that its “Constant Guard” programme will in future detect if their customer machines become members of a botnet.

ObDeclaration: I assisted apComms as a specialist adviser, but the decision on what they wished to recommend was theirs alone.

How much did shutting down McColo help?

On 11 November 2008 McColo, a Californian server hosting company, was disconnected from the Internet. This took the controllers for 6 major botnets offline. It has been widely reported that email spam volumes were markedly reduced for some time thereafter. But did disconnecting McColo only get rid of “easy to block” spam?

In a paper presented this week at the Sixth Conference on Email and Antispam (CEAS) I examined email traffic data for for the incoming email to a UK ISP to see what effect the disconnection had.
Continue reading How much did shutting down McColo help?