What do we mean by scale?

Online booking is now open for the annual Lovelace lecture, which I’m giving in London on March 15th. I’ll be using the event to try to develop a cross-cutting theme that arises from my research but is of wider interest. What do we mean by scale?

Back when computing was expensive, computer science started out worrying about how computations scaled – such as which sorting algorithms ran as n², n log n or even n. Once software development became the bottleneck, we started off by trying the same approach (measuring code entropy or control graph complexity), but moved on to distinguishing what types of complexity could be dealt with by suitable tools. Mass-market computing and communications brought network effects, and scalability started to depend on context (this is where security economics came in). Now we move to “Big Data” the dependency on people becomes more explicit. Few people have stopped to think of human factors in scaling terms. Do we make information about many people available to many, or to few? What about the complexity of the things that can be done with personal data? What about costs now versus in the future, and the elasticity of demand associated with such costs? Do you just count the data subjects, do you count the attackers too, or do you add the cops as well?

I’ve been quoted as saying “You can have security, functionality, scale – choose any two” or words to that effect. I’ll discuss this and try to sketch the likely boundaries, as well as future research directions. The discussion will cross over from science and engineering to economics and politics; recent proposed legislation in the UK, and court cases in the USA, would impose compliance burdens on people trying to scale systems up from one country to many.

The students we’re training to be the next generation of developers and entrepreneurs will need a broader understanding of what’s involved in scaling systems up, and in this talk I’ll try to explore what that means. Maybe I’m taking a risk with this talk, as I’m trying to assemble into a row a lot of facts that are usually found in different columns. But I do hope it will not be boring.

My Yahoo! password histograms are now available (with differential privacy!)

5 years ago, I compiled a dataset of password histograms representing roughly 70 million Yahoo! users. It was the largest password dataset ever compiled for research purposes. The data was a key component of my PhD dissertation the next year and motivated new statistical methods for which I received the 2013 NSA Cybersecurity Award.

I had always hoped to share the data publicly. It consists only of password histograms, not passwords themselves, so it seemed reasonably safe to publish. But without a formal privacy model, Yahoo! didn’t agree. Given the history of deanonymization work, caution is certainly in order. Today, thanks to new differential privacy methods described in a paper published at NDSS 2016 with colleagues Jeremiah Blocki and Anupam Datta, a sanitized version of the data is publicly available.

Continue reading My Yahoo! password histograms are now available (with differential privacy!)

“Do you see what I see?” ask Tor users, as a large number of websites reject them but accept non-Tor users

If you use an anonymity network such as Tor on a regular basis, you are probably familiar with various annoyances in your web browsing experience, ranging from pages saying “Access denied” to having to solve CAPTCHAs before continuing. Interestingly, these hurdles disappear if the same website is accessed without Tor. The growing trend of websites extending this kind of “differential treatment” to anonymous users undermines Tor’s overall utility, and adds a new dimension to the traditional threats to Tor (attacks on user privacy, or governments blocking access to Tor). There is plenty of anecdotal evidence about Tor users experiencing difficulties in browsing the web, for example the user-reported catalog of services blocking Tor. However, we don’t have sufficient detail about the problem to answer deeper questions like: how prevalent is differential treatment of Tor on the web; are there any centralized players with Tor-unfriendly policies that have a magnified effect on the browsing experience of Tor users; can we identify patterns in where these Tor-unfriendly websites are hosted (or located), and so forth.

Today we present our paper on this topic: “Do You See What I See? Differential Treatment of Anonymous Users” at the Network and Distributed System Security Symposium (NDSS). Together with researchers from the University of Cambridge, University College London, University of California, Berkeley and International Computer Science Institute (Berkeley), we conducted comprehensive network measurements to shed light on websites that block Tor. At the network layer, we scanned the entire IPv4 address space on port 80 from Tor exit nodes. At the application layer, we fetched the homepage from the most popular 1,000 websites (according to Alexa) from all Tor exit nodes. We compared these measurements with a baseline from non-Tor control measurements, and uncover significant evidence of Tor blocking. We estimate that at least 1.3 million IP addresses that would otherwise allow a TCP handshake on port 80 block the handshake if it originates from a Tor exit node. We also show that at least 3.67% of the most popular 1,000 websites block Tor users at the application layer.

Continue reading “Do you see what I see?” ask Tor users, as a large number of websites reject them but accept non-Tor users

Financial Cryptography 2016

I will be trying to liveblog Financial Cryptography 2016, which is the twentieth anniversary of the conference. The opening keynote was by David Chaum, who invented digital cash over thirty years ago. From then until the first FC people believed that cryptography could enable commerce and also protect privacy; since then pessimism has slowly set in, and sometimes it seems that although we’re still fighting tactical battles, we’ve lost the war. Since Snowden people have little faith in online privacy, and now we see Tim Cook in a position to decide which seventy phones to open. Is there a way to fight back against a global adversary whose policy is “full take”, and where traffic data can be taken with no legal restraint whatsoever? That is now the threat model for designers of anonymity systems. He argues that in addition to a large anonymity set, a future social media system will need a fixed set of servers in order to keep end-to-end latency within what chat users expect. As with DNS we should have servers operated by (say ten) different principals; unlike in that case we don’t want to have most of the independent parties financed by the US government. The root servers could be implemented as unattended seismic observatories, as reported by Simmons in the arms control context; such devices are fairly easy to tamper-proof.

The crypto problem is how to do multi-jurisdiction message processing that protects not just content but also metadata. Systems like Tor cost latency, while multi-party computation costs a lot of cycles. His new design, PrivaTegrity, takes low-latency crypto building blocks then layers on top of them transaction protocols with large anonymity sets. The key component is c-Mix, whose spec up as an eprint here. There’s a precomputation using homomorphic encryption to set up paths and keys; in real-time operations each participating phone has a shared secret with each mix server so things can run at chat speed. A PrivaTegrity message is four c-Mix batches that use the same permutation. Message models supported include not just chat but publishing short anonymous messages, providing an untraceable return address so people can contact you anonymously, group chat, and limiting sybils by preventing more than one pseudonym being used. (There are enduring pseudonyms with valuable credentials.) It can handle large payloads using private information retrieval, and also do pseudonymous digital transactions with a latency of two seconds rather than the hour or so that bitcoin takes. The anonymous payment system has the property that the payer has proof of what he paid to whom, while the recipient has no proof of who paid him; that’s exactly what corrupt officials, money launderers and the like don’t want, but exactly what we do want from the viewpoint of consumer protection. He sees PrivaTegrity as the foundation of a “polyculture” of secure computing from multiple vendors that could be outside the control of governments once more. In questions, Adi Shamir questioned whether such an ecosystem was consistent with the reality of pervasive software vulnerabilities, regardless of the strength of the cryptography.

I will try to liveblog later sessions as followups to this post.

Arresting development?

There have been no arrests or charges for cybercrime events in the UK for almost two months. I do not believe that this apparent lack of law enforcement action is the result of any recent reduction in cybercrime. Instead, I predict that a multitude of coordinated arrests is being planned, to take place nationally over a short period of time.

My observations arise from the Cambridge Computer Crime Database (CCCD), which I have been maintaining for some time now. The database contains over 400 entries dating back to January 2010, detailing arrests, charges, and prosecutions for computer crime in the UK.

Since the beginning of 2016, there have been no arrests or charges for incidents that fit within the scope of the CCCD that I have picked up using various public source data collection methods. The last arrest was in mid-December, when a male was arrested on suspicion of offences under sections 1 and 2 of the Computer Misuse Act. Press coverage of this arrest linked it to the VTech data breach.

A coordinated ‘cyber crime strike week’ took place in early March 2015. In just one week, 57 suspects were arrested for a range of offences, including denial of service attacks, cyber-enabled fraud, network intrusion and data theft, and malware development.

Coordinated law enforcement action to address particular crime problems is not uncommon. A large number of arrests is ‘newsworthy’, capturing national headlines and sending the message that law enforcement take these matters seriously and wrongdoers will be caught. What is less clear is whether one week of news coverage would have a greater effect than 52 weeks of more sustained levels of arrest.

Furthermore, many of the outcomes of the 2015 arrests are unknown (possibly indicating no further action has been taken), or pending. This indicates that large numbers of simultaneous arrests may place pressure on the rest of the criminal justice system, particularly for offences with complex evidentiary requirements.

Report on the IP Bill

This morning at 0930 the Joint Committee on the IP Bill is launching its report. As one of the witnesses who appeared before it, I got an embargoed copy yesterday.

The report s deeply disappointing; even that of the Intelligence and Security Committee (whom we tended to dismiss as government catspaws) is more vigorous. The MPs and peers on the Joint Committee have given the spooks all they wanted, while recommending tweaks and polishes here and there to some of the more obvious hooks and sharp edges.

The committee supports comms data retention, despite acknowledging that multiple courts have found this contrary to EU and human-rights law, and the fact that there are cases in the pipeline. It supports extending retention from big telcos offering a public service to private operators and even coffee shops. It support greatly extending comms data to ICRs; although it does call for more clarity on the definition, it give the Home Office lots of wriggle room by saying that a clear definition is hard if you want to catch all the things that bad people might do in the future. (Presumably a coffee shop served with an ICR order will have no choice but to install a government-approved black box. or just pipe everything to Cheltenham.) It welcomes the government decision to build and operate a request filter – essentially the comms database for which the Home Office has been trying to get parliamentary approval since the days of Jacqui Smith (and which Snowden told us they just built anyway). It comes up with the rather startling justification that this will help privacy as the police may have access to less stuff (though of course the spooks, including our 5eyes partners and others, will have more). It wants end-to-end encrypted stuff to be made available unless it’s “not practicable to do so”, which presumably means that the Home Secretary can order Apple to add her public key quietly to your keyring to get at your Facetime video chats. That has been a key goal of the FBI in Crypto War 2; a Home Office witness openly acknowledged it.

The comparison with the USA is stark. There, all three branches of government realised they’d gone too far after Snowden. President Obama set up the NSA review group, and implemented most of its recommendations by executive order; the judiciary made changes to the procedures of the FISA Court; and Congress failed to renew the data retention provisions in the Patriot Act (aided by the judiciary). Yet here in Britain the response is just to take Henry VIII powers to legalise all the illegal things that GCHQ had been up to, and hope that the European courts won’t strike the law down yet again.

People concerned for freedom and privacy will just have to hope the contrary. The net effect of the minor amendments proposed by the joint committee will be to make it even harder to get any meaningful amendments as the Bill makes its way through Parliament, and we’ll end up having to rely on the European courts to trim it back.

For more, see Scrambling for Safety, a conference we held last month in London on the bill and whose video is now online, and last week’s Cambridge symposium for a more detailed analysis.

Can we crowdsource trust?

Your browser contains a few hundred root certificates. Many of them were put there by governments; two (Verisign and Comodo) are there because so many merchants trust them that they’ve become ‘too big to fail’. This is a bit like where people buy the platform with the most software – a pattern of behaviour that let IBM and then Microsoft dominate our industry in turn. But this is not how trust should work; it leads to many failures, some of them invisible.

What’s missing is a mechanism where trust derives from users, rather than from vendors, merchants or states. After all, the power of a religion stems from the people who believe in it, not from the government. Entities with godlike powers that are foisted on us by others and can work silently against us are not gods, but demons. What can we do to exorcise them?

Do You Believe in Tinker Bell? The Social Externalities of Trust explores how we can crowdsource trust. Tor bridges help censorship victims access the Internet freely, and there are not enough of them. We want to motivate lots of people to provide them, and the best providers are simply those who help the most victims. So trust should flow from the support of the users, and it should be hard for powerful third parties to pervert. Perhaps a useful mascot is Tinker Bell, the fairy in Peter Pan, whose power waxes and wanes with the number of children who believe in her.

Three exciting job openings in security usability

We are looking for three more people to join the Cambridge security group. Two job adverts, intended for postgrads or postdocs, are already out now. A third one, specifically aimed at a final year undergraduate or master student, strong on programming but with no significant work experience, is currently making its way through the HR pipeline and should appear soon. Please pass this on to anyone potentially interested.

With the Pico project (see website for videos, papers and more) we wish to liberate humanity from the usability and security problems of passwords. We are looking for a UX designer to help us in our quest to produce a user-centred, effective and pleasant to use solution and for two software engineers with a security mindset to help us build it and make it robust against attacks. Would you like to join us and contribute to eliminating the annoyance and frustration of passwords from the daily experience of billions of computer users?
  1. User experience (UX) designer
    Research Associate or Assistant (with/without PhD)
    Start date: ASAP
    Details and link to application form: http://www.jobs.cam.ac.uk/job/9244/
  2. Senior software engineer / software engineer
    Research Associate or Assistant (with/without PhD)
    Start date: ASAP
    Details and link to application form: http://www.jobs.cam.ac.uk/job/9245/
  3. Software engineer
    Research assistant (having just completed a bachelor or master in CS/EE)
    Start date: June 2016
    Watch this space: the ad should go live within a week or so
    https://www.mypico.org/jobs/