A few days ago, BBC2’s Newsnight approached me to have a look inside what might have been some kind of smartcard, but had long been suspected to be part of a simple-minded and dangerous fraud that may already have cost lives. Continue reading Placebo bomb detectors
Encoding integers in the EMV protocol
On the 1st of January 2010, many German bank customers found that their banking smart cards had stopped working. Details of why are still unclear, but indications are that the cards believed that the date was 2016, rather than 2010, and so refused to process a transaction supposedly after their expiry dates. This problem could turn out to be quite expensive for the cards’ manufacturer, Gemalto: their shares dropped almost 4%, and they have booked a €10 m charge to handle the consequences.
These cards implement the EMV protocol (the same one used for Chip and PIN in the UK). Here, the card is sent the current date in 3-byte YYMMDD binary-coded decimal (BCD) format, i.e. “100101” on 1 January 2010. If however this was interpreted as hexadecimal, then the card will think the year is 2016 (in hexadecimal, 1 January 2010 should have actually been “0a0101”). Since the numbers 0–9 are the same in both BCD and hexadecimal, we can see why this problem only occurred in 2010*.
In one sense, this looks like a foolish error, and should have been caught in testing. However, before criticizing too harshly, one should remember that EMV is almost impossible to implement perfectly. I have written a fairly complete implementation of the protocol and frequently find edge cases which are insufficiently documented, making dealing with them error-prone. Not only is the specification vague, but it is also long — the first public version in 1996 was 201 pages, and it grew to 765 pages by 2008. Moreover, much of the complexity is unnecessary. In this article I will give just one example of this — the fact that there are nine different ways to encode integers.
Mobile Internet access data retention (not!)
In the first article in this series I discussed why massive use of Network Address Translation (NAT) means that traceability for mobile Internet access requires the use of source port numbers. In the second article I explained how in practice the NAT logging records, that record the mapping from IP address to customer, are available for only a short time — or may not exist at all.
This might seem a little surprising because within the EU a “data retention” regime has been in place since the Spring of 2009. So surely the mobile phone companies have to keep the NAT records of Internet access, even though this will be horribly expensive?
They don’t!
The reason is that instead of the EU Directive (and hence UK and other European laws) saying what was to be achieved — “we want traceability to work” — the bureaucrats decided to say what they wanted done — “we want logs of IP address allocation to be kept”. For most ISPs the two requirements are equivalent. For the mobile companies, with their massive use of NAT, they are not equivalent at all.
The EU Directive (Article 5) requires an ISP to retain for all Internet access events (the mobile call itself will require other info to be retained):
(a)(i) the user ID(s) allocated;
(a)(iii) the name and address of the subscriber or registered user to whom an Internet Protocol (IP) address, user ID or telephone number was allocated at the time of the communication;
(c)(i) the date and time of the log-in and log-off of the Internet access service, based on a certain time zone, together with the IP address, whether dynamic or static, allocated by the Internet access service provider to a communication, and the user ID of the subscriber or registered user;
(e)(ii) the digital subscriber line (DSL) or other end point of the originator of the communication;
That is, the company must record which IP address was given to the user, but there is no requirement to record the source port number. As discussed in this series of articles, this makes traceability extremely problematic.
It’s also somewhat unclear (but then much more of the Directive is technically unclear) whether recording the “internal” IP address allocated to the user is sufficient, or whether the NAT records (without the port numbers) need to be kept as well. Fortunately, in the UK, the Regulations that implement the Directive make it clear that the rules only apply once a notice has been served on an ISP, and that notice must say to what extent the rules apply. So in principle, all should be clear to the mobile telcos!
By the way … this bureaucratic insistence on saying what is to be done, rather than what is to be achieved, can also be found in the Digital Economy Bill which is currently before the House of Lords. It keeps on mentioning “IP addresses” being required, with no mention of source port numbers.
But perhaps that particular problem will turn out OK? Apple will not let anyone with an iPhone download music without permission!
Practical mobile Internet access traceability
In an earlier article I explained how the mobile phone companies are using Network Address Translation on a massive scale to allow hundreds of Internet access customers to share a single IP address. I pointed out how it was now necessary to record the source port as well as the IP address if you wanted to track somebody down.
Having talked in detail about this with one of the UK’s major mobile phone companies, I can now further describe some practical issues (Caveat: other companies may differ in how they’ve implemented their details, but all of them are doing something very similar).
The good news, first, is that things are not as bad as they might be!
By design, NAT systems provide a constant mapping to a single IP address for any given user (at least until they next disconnect). This means that, for example, a website that is tracking visitors by IP address will not serve the wrong content; and their log analysis program will see constant IP addresses when the user changes page or fetches an image, so that audience measurements will remain valid. From a security point of view, it means that provided you have at least one logging record with IP address + port number + timestamp, then you will have sufficient data to be able to seek to make an identification.
As a quick aside, you may be thinking that you could do an “inference attack” to identify someone without using a source port number. Suppose that you can link several bad events together over a period of time, but only have the IP address of each. Despite the telco having several hundred people using each IP address at each relevant instant, only one user might be implicated on every occasion. Viewers of the wire will recall a similar scheme being used to identify Stringer Bell’s second SIM card number!
Although this inference approach would work fine in theory, the telco I spoke with does not keep its records in a suitable form for this to be at all efficient. So, even supposing that one could draft the appropriate legal request (a s22 notice, as prescribed by the UK’s Regulation of Investigatory Powers Act), the cost of doing the searches and collating the results (and those costs are borne by the investigators), would be prohibitive.
But now it’s time for the bad news.
Traditional ISP IP address usage records (in RADIUS or similar systems) have both a “start” and “stop” record. The consistency of these records within the logging system gives considerable assurance that the data is valid and complete. The NAT logging only records an event when the source port starts to be used — so if records go missing (and classical syslog systems can lose records when network errors occur), then there is no consistency check to show that the wrong account has been identified.
The logging records that show which customer was using which IP address (and source port) are extremely large — dozens of records can be generated by viewing just one web page. They also provide sensitive information about customer habits, and so if they are retained at all, it will only be for a short period of time. This means that if you want traceability then you need to get a move on. ISPs typically keep logs of IP address usage for a few weeks. At the mobile companies (because of the volume) you will in practice have to consult the records within just a few days.
Furthermore, even when the company intends to hold the data for a short period, it turns out that under heavy load the NAT equipment struggles to do what it’s supposed to be doing, leave alone generate gigabytes of logging. So the logging is often turned off for long periods for service protection reasons.
Clearly there’s a reputational risk to not having any records at all. For an example which does not have anything to do with policing: not being able to track down the sources of email spam would demean the mobile company in the eyes of other ISPs (which in practice will be seen by ever more aggressive filtering of all of their email). However, that risk is rather long-term; keeping the system running “now” is rather more important; and there is a lot that a mobile company can do to block and detect spam within their own networks — they don’t need to rely on being able to process external abuse reports.
In the third and final article of this little series I will consider the question of “data retention”. Surely the mobile phone company has a legal duty to keep traceability records? It turns out that the regulators screwed it up — and they don’t!
The Real Hustler
Paul Wilson, my esteemed coauthor on that paper on the psychology of scam victims that is currently attracting quite a bit of attention, has just started an entertaining and instructive new blog, The Real Hustler. If you liked our paper, you’ll probably enjoy Paul’s blog.
Well worth a bookmark and repeat visits for fans of the BBC TV series and for researchers who recognize the importance of the exciting new field of security psychology.
Extending the requirements for traceability
A large chunk of my 2005 PhD thesis explained how “traceability” works; how we can attempt to establish “who did that?” on the Internet.
The basics are that you record an IP address and a timestamp; use the Regional Internet Registry records (RIPE, ARIN etc) to determine which ISP has been allocated the IP address; and then ask the ISP to use their internal records to determine which customer account was allocated the IP address at the relevant instant. All very simple in concept, but hung about — as the thesis explained — by considerable caveats as to whether the simple assumptions involved are actually true in a particular case.
One of the caveats concerned the use of Network Address Translation (NAT), whereby the IP addresses used by internal machines are mapped back and forth to external IP addresses that are visible on the global Internet. The most familiar NAT arrangement is that used by a great many home broadband users, who have one externally facing IP address, yet run multiple machines within the household.
Companies also use NAT. If they own sufficient IP addresses they may map one-to-one between internal and external addresses (usually for security reasons), or they may only have 4 or 8 external IP addresses, and will use some or all of them in parallel for dozens of internal machines.
Where NAT is in use, as my thesis explained, traceability becomes problematic because it is rare for the NAT equipment to generate logs to record the internal/external mapping, and even rarer for those logs to be preserved for any length of time. Without these logs, it is impossible to work out which internal user was responsible for the event being traced. However, in practice, all is not lost because law enforcement is usually able to use other clues to tell them which member of the household, or which employee, they wish to interview first.
Treating NAT with this degree of equanimity is no longer possible, and that’s because of the way in which the mobile telephone companies are providing Internet access.
The shortage of IPv4 addresses has meant that the mobile telcos have not been able to obtain huge blocks of address space to dish out one IP address per connected customer — the way in which ISPs have always worked. Instead, they are using relatively small address blocks and a NAT system, so that the same IP address is being simultaneously used by a large number of customers; often hundreds at a time.
This means that the only way in which they can offer a traceability service is if they are provided with an IP address and a timestamp AND ALSO with the TCP (or UDP) source port number. Without that source port value, the mobile firm can only narrow down the account being used to the extent that it must be one out of several hundred — and since those several hundred will have nothing in common, apart from their choice of phone company, law enforcement (or anyone else who cares) will be unable to go much further.
So, the lesson here is clear — if you are creating logs of activity for security purposes — because you might want to use the information to track someone down — then you must record not only the IP address, but also the source port number.
This will soon be a necessity not just for connections from mobile companies, but for many other Internet access providers as well — because of the expected rise of “Carrier Grade NAT” (or “Large Scale NAT“), as one way of staving off the advent of the different sort of world we will enter when we “run out” of IPv4 addresses — sometime in the next two or three years.
There is currently an “Internet Draft” (a document that might become an RFC one day) that sets out a number of other issues which arise when addresses are routinely shared … though the authors appear unaware that this isn’t something to worry about in 2011, but something that has already been happening for some time, and at considerable “scale”.
In my next article, I’ll discuss what this massive use of NAT means in practice when traceability leads you to a mobile phone user.
Relay attack featured on Dutch TV
Yesterday, the Dutch TV programme “Goudzoekers” featured Saar Drimer and me demonstrating a relay attack against the recently introduced Chip and PIN system in The Netherlands. The video can be found online, in both Windows Media or Silverlight formats as well as Flash below. The production team have published a synopsis (translated version) on their blog, and today there have been some follow-ups in the press, for example De Telegraaf (translated version).
When is a leak not a leak ?
WikiLeaks have decided to save other people some bandwidth and make some of my powerpoint slides available on their site. Since they usually publish censored or confidential information, they’re presumably completely unaware of how these slides have been available to the public from two different websites since the day of the talk.
Remarkably similar slides (I’m often lazy!) were also presented in talks I have given this year to INEX (the Irish Internet Exchange Point) [slides here], to EuroBSDCon [slides here] and the BCS (Herts branch) [slides here].
These talks have been covered various technical aspects of the blocking of child sexual abuse images for sites that appear on the IWF list. I’ve been mentioning the blocking of Wikipedia just over a year ago, and the blocking of archive.org up to last February. However, I’ve also thrown in a couple of slides about some more recent research, yet to be published, which explores a different way of determining what is on the IWF list. That seems to have been what has interested WikiLeaks.
Continue reading When is a leak not a leak ?
Facebook tosses graph privacy into the bin
Facebook has been rolling out new privacy settings in the past 24 hours along with a “privacy transition” tool that is supposed to help users update their settings. Ostensibly, Facebook’s changes are the result of pressure from the Canadian privacy commissioner, and in Facebook’s own words the changes are meant to be “new tools to control your experience.” The changes have been harshly criticized in a number of high-profile places: the New York Times, Wired, Cnet, TechCrunch, Valleywag, ReadWriteWeb, and by the the EFF and the ACLU. The ACLU has the most detailed technical summary of changes, essentially there are more granular controls but many more things will default to “open to everyone.” It’s most telling to check the blogs used by Facebook developers and marketers with a business interest in the matter. Their take is simple: a lot more information is about to be shared and developers need to find out how to use it.
The most discussed issue is the automatic change to more open-settings, which will lead to privacy breaches of the socially-awkward variety, as users will accidentally post something that the wrong person can read. This will assuredly happen more frequently as a direct result of these changes, even though Facebook is trying to force users to read about the new settings, it’s a safe bet that users won’t read any of it. Many people learn how Facebook works by experience, they expect it to keep working that way and it’s a bad precedent to change that when it’s not necessary. The fact that Facebook’s “transition wizard” includes one column of radio buttons for “keep my old settings” and a pre-selected column for “switch to the new settings Facebook wants me to have” shows that either they don’t get it or they really don’t respect their users. Most of this isn’t surprising though: I wrote in June that Facebook would be automatically changing user settings to be more open, TechCrunch also saw this coming in July.
There’s a much more surprising bit which has been mostly overlooked-it’s now impossible for any user to hide their friend list from being globally viewable to the Internet at large. Facebook has a few shameful cop-out statements about this, stating that you can remove it from your default profile view if you wish, but since (in their opinion) it’s “publicly available information” you can’t hide it from people who really want to see it. It has never worked this way previously, as hiding one’s friend list was always an option, and there have been many research papers, including a few by me and colleagues in Cambridge, concluding that the social graph is actually the most important information to keep private. The threats here are more fundamental and dangerous-unexpected inference of sensitive information, cross-network de-anonymisation, socially targeted phishing and scams.
It’s incredibly disappointing to see Facebook ignoring a growing body of scientific evidence and putting its social graph up for grabs. It will likely be completely crawled fairly soon by professional data aggregators, and probably by enterprising researchers soon after. The social graph is powerful view into who we are—Mark Zuckerberg said so himself—and it’s a sad day to see Facebook cynically telling us we can’t decide for ourselves whether or not to share it.
UPDATE 2009-12-11: Less than 12 hours after publishing this post, Facebook backed down citing criticism and made it possible to hide one’s friend list. They’ve done this in a laughably ham-handed way, as friend-list visibility is now all-or-nothing while you can set complex ACLs on most other profile items. It’s still bizarre that they’ve messed with this at all, for years the default was in fact to only show your friend list to other friends. One can only conclude that they really want all users sharing their friend list, while trying to appear privacy-concerned: this is precisely the “privacy communication game” which Sören Preibusch and I wrote of in June. This remains an ignoble moment for Facebook-the social graph will still become mostly public as they’ll be changing overnight the visibility of hundreds of millions of users’ friends lists who don’t find this well-hidden opt-out.
What does Detica detect?
There has been considerable interest in a recent announcement by Detica of “CView” which their press release claims is “a powerful tool to measure copyright infringement on the internet”. The press release continues by saying that it will provide “a measure of the total volume of unauthorised file sharing”.
Commentators have divided as to whether these claims are nonsense, or whether the system must be deeply intrusive. The main reason for this is that when peer-to-peer file sharing flows are encrypted, it is impossible for a passive observer to know what is being transferred.
I met with Detica last Friday, at their suggestion, to discuss what their system actually did (they’ve read some of my work on Phorm’s system, so meeting me was probably not entirely random). With their permission, I can now explain the basics of what they are actually doing. A more detailed account should appear at some later date.
Continue reading What does Detica detect?