What does Detica detect? – Light Blue Touchpaper

There has been considerable interest in a recent announcement by Detica of “CView” which their press release claims is “a powerful tool to measure copyright infringement on the internet”. The press release continues by saying that it will provide “a measure of the total volume of unauthorised file sharing”.

Commentators have divided as to whether these claims are nonsense, or whether the system must be deeply intrusive. The main reason for this is that when peer-to-peer file sharing flows are encrypted, it is impossible for a passive observer to know what is being transferred.

I met with Detica last Friday, at their suggestion, to discuss what their system actually did (they’ve read some of my work on Phorm’s system, so meeting me was probably not entirely random). With their permission, I can now explain the basics of what they are actually doing. A more detailed account should appear at some later date.

Their system starts by using fibre taps to pick off traffic from an appropriate part of the ISP network. They use a fibre tap rather than “port mirroring” to make it easier for the ISP to be sure that they won’t disrupt any traffic. The links that they monitor need not be carrying all of the ISP’s traffic — they merely hope that it will be a statistically significant sample.

The raw traffic is then sent to the CView box, which can handle multiple 10Gbit links. The first stage of processing is in hardware (FPGAs), then software takes over. The “external” endpoint identity is discarded and the “internal” identity is encrypted using a key that is not made available outside the box (ie: the intent is to make the customer “anonymous” but to be able to link different activity from the same source).

It should be carefully noted that this anonymity means that this system is intentionally useless for (and has nothing whatsoever to do with) any schemes for writing letters to, slowing down, or disconnecting, people who unlawfully share copyrighted materials. It’s all about measurements, not identification.

The content of the traffic is inspected to try and recognise whether it is peer-to-peer (P2P) traffic and if so, which particular protocol is being used. Most protocols are easy to recognise if you see the whole datastream — even encrypted traffic can be preceded by cleartext messages that are easy to distinguish.

If what is being seen is P2P traffic with unencrypted content, a unique identifier is extracted that indicates which file is being shared. This is much easier than you might at first imagine — most of the P2P protocols identify content via unique identifiers (usually a cryptographic hash of the file) and then pass this identifier around with every block, in easy-to-locate fields. The CView box then spits out a record containing:

the encrypted (and thus anonymised) customer identity
the type of P2P protocol
the content identifier value
the file size
a timestamp

Where the content of the P2P communication was encrypted, the content identifier is unavailable, so the record generated is as above, but that field cannot be filled in.

The records from the CView box are now passed to a statistics system. This looks up the content identifier (where known) in a database to see if it is copyrighted material that should not be seen on P2P networks. The statistics system then scales up its numbers (to adjust for any sampling at the earlier stages) and generates reports and graphs that give information such as the total amount of P2P traffic; what proportion is encrypted; what proportion of the unencrypted traffic appears to be a copyright infringement; the total number of customer accounts that are doing any file sharing; and so on.

As can be seen from this description, the claims in the Press Release are a little wide of the mark, in that if a substantial amount of traffic is encrypted (as is widely believed to be the case) then the proportion that is “unlawful file sharing” can only be guessed at. Also, the system cannot even be totally sure that the transfer of a copyrighted file is in fact unlawful (it might be covered by one of the statutory exemptions), however, the inaccuracy from this is likely to be very small !

The other potential flaw with the whole system is that there may be inaccuracies in detecting P2P protocols. Detica view their current system as a trial, and their system currently only attempts to detect the top three P2P protocols. New protocols, or developments of existing protocols, might well not be recognised, or may look too much like something else, such as “https” traffic. So if the statistics machine says that there is less file sharing going forward, then for quite a number of reasons, this may not quite reflect reality.

There’s also a wider issue as to whether reduction in P2P traffic means less file sharing overall, since users may migrate back to using Usenet, or fetch their files from online repositories, move all their traffic over encrypted tunnels such as VPNs or Tor, or just swap multi-gigabyte “thumb drives” at the pub or in the playground.

Detica are giving the impression that ISPs will be happy to see their new product. I’m less sure that ISPs actually want to measure this traffic quite so exactly. They’re keen purchasers of “traffic shaping” kit, that detects P2P and slows it down; and the statistics from these boxes may be quite sufficient already for their traffic management purposes.

However, ISPs who want to collaborate with media industries might wish to have an “industry standard” measurement tool so that some accurate numbers will inform their discussions. However, this presupposes that they’re prepared to admit how much P2P traffic they’re carrying which might be a bit of a hostage to fortune. I strongly suspect the ISPs would like the option of keeping any embarassing statistics to themselves, but still have Hollywood share in paying the Detica invoices (as if)!

I should also address (especially given the huge fuss over Phorm) the rather important question as to whether the system is lawful to operate? Please note that IANAL, but I’ve studied their writings in this area a fair bit…

The design as explained above seems to address issues of privacy and data protection (amalgamating statistics and discarding identifiers is a sound technique for jumping these hurdles). But there is then the vexed question of illegal interception. The system does “wire-tapping”, that’s obvious, but the criminal offence is called “interception” and that is carefully defined within the Regulation of Investigatory Powers Act 2000. I expect that Detica would wish to argue that there is no interception because no content is seen by any humans… however, spitting out the file identifier might in itself be sufficient to infringe. It may take some case law before anyone can say for sure.

It seems that Virgin (reported to be deploying this Detica system) are taking the view that they’d rather not argue about whether it’s interception, but have indicated that they intend to rely instead upon using it for “network management”, or more formally, the s3(3) statutory exemption that permits interception if “it takes place for purposes connected with the provision or operation of that service.”

Knowing how much of your traffic is file sharing is something that network engineers would wish to know. However, knowing how much of the traffic is unlawful (and getting a list of all the material that is being shared unencrypted) is a bit more of a stretch — but perhaps the marketing people can claim that they need this knowledge to provide a service, and Virgin have announced that they are going to be providing a music service of their own.

Finally, the paranoid will observe that minor tweaks to the software will deliver up a first-class monitoring system that can generate reports about unlawful activity by individual users; so that anyone whose P2P activity is unencrypted (and who actually gets sampled) will be immediately detected.

Applying to the courts for an injunction to require these tweaks be made does not seem out-of-line with other media industry legal initiatives in Belgium and Ireland. It’s hard to say whether such an injunction would be granted in the UK, and the media industries have no previous signs of taking this route here. Nonetheless, a cautious ISP that is concerned about the wider PR aspects of deploying this system might think carefully about the likely benefits before giving the nice chaps at Detica (full disclosure, they paid for my lunch) a call.