Mark Zuckerberg tried to blame Cambridge University in his recent testimony before the US Senate, saying “We do need to understand whether there was something bad going on in Cambridge University overall, that will require a stronger action from us.”
The New Scientist invited me to write a rebuttal piece, and here it is.
Dr Kogan tried to get approval to use the data his company had collected from Facebook users in academic research. The psychology ethics committee refused permission, and when he appealed to the University Ethics Committee (declaration: I’m a member) this refusal was upheld. Although he’d got consent from the people who ran his app, the same could not be said of their Facebook “friends” from whom most of the data were collected.
The deceptive behaviour here has been by Facebook, which creates the illusion of privacy in order to get its users to share more data. There has been a lot of work on the economics and psychology of privacy over the past decade and we now understand the dynamics of advertising markets better than we used to.
One big question is the “privacy paradox”. Why do people say they care about privacy, yet behave otherwise? Part of the answer is about context; and part of it is about learning. Over time, more and more people are starting to pay attention to online privacy settings, despite attempts by Facebook and other online advertising firms to keep changing privacy settings to confuse people.
With luck, the Facebook scandal will be a “flashbulb moment” that will drive lots more people to start caring about their privacy online. It will certainly provide interesting new data to privacy researchers.
What Goes Around Comes Around is a chapter I wrote for a book by EPIC. What are America’s long-term national policy interests (and ours for that matter) in surveillance and privacy? The election of a president with a very short-term view makes this ever more important.
While Britain was top dog in the 19th century, we gave the world both technology (steamships, railways, telegraphs) and values (the abolition of slavery and child labour, not to mention universal education). America has given us the motor car, the Internet, and a rules-based international trading system – and may have perhaps one generation left in which to make a difference.
Lessig taught us that code is law. Similarly, architecture is policy. The architecture of the Internet, and the moral norms embedded in it, will be a huge part of America’s legacy, and the network effects that dominate the information industries could give that architecture great longevity.
So if America re-engineers the Internet so that US firms can microtarget foreign customers cheaply, so that US telcos can extract rents from foreign firms via service quality, and so that the NSA can more easily spy on people in places like Pakistan and Yemen, then in 50 years’ time the Chinese will use it to manipulate, tax and snoop on Americans. In 100 years’ time it might be India in pole position, and in 200 years the United States of Africa.
My book chapter explores this topic. What do the architecture of the Internet, and the network effects of the information industries, mean for politics in the longer term, and for human rights? Although the chapter appeared in 2015, I forgot to put it online at the time. So here it is now.
The Economist features face recognition on its front page, reporting that deep neural networks can now tell whether you’re straight or gay better than humans can just by looking at your face. The research they cite is a preprint, available here.
Its authors Kosinski and Wang downloaded thousands of photos from a dating site, ran them through a standard feature-extraction program, then classified gay vs straight using a standard statistical classifier, which they found could tell the men seeking men from the men seeking women. My students pretty well instantly called this out as selection bias; if gay men consider boyish faces to be cuter, then they will upload their most boyish photo. The paper authors suggest their finding may support a theory that sexuality is influenced by fetal testosterone levels, but when you don’t control for such biases your results may say more about social norms than about phenotypes.
Quite apart from the scientific value of the research, which is perhaps best assessed by specialists, I’m concerned with the ethics and privacy aspects. I am surprised that the paper doesn’t report having been through ethical review; the authors consider that photos on a dating website are public information and appear to assume that privacy issues simply do not arise.
Yet UK courts decided, in Campbell v Mirror, that privacy could be violated even by photos taken on the public street, and European courts have come to similar conclusions in I v Finland and elsewhere. For example, a Catholic woman is entitled to object to the use of her medical record in research on abortifacients and contraceptives even if the proposed use is fully anonymised and presents no privacy risk whatsoever. The dating site users would be similarly entitled to object to their photos being used in research to which they might have an ethical objection, even if they could not be identified from their photos. There are surely going to be people who object to research in any nature vs nurture debate, especially on a charged topic such as sexuality. And the whole point of the Economist’s coverage is that face-recognition technology is now good enough to work at population scale.
Now that everyone’s distracted with the supreme court case on Brexit, you can expect the government to sneak out something it’s ashamed of. Health secretary Jeremy Hunt has decided to ignore the wishes of over a million people who opted out of having their hospital records given to third parties such as drug companies, and the ICO has decided to pretend that the anonymisation mechanisms he says he’ll use instead are sufficient. One gently smoking gun is the fifth bullet in a new webpage here, where the Department of Health claims that when it says the data are anonymous, your wishes will be ignored. The news has been broken in an article in the Health Services Journal (it’s behind a paywall, as a splendid example of transparency) with the Wellcome Trust praising the ICO’s decision not to take action against the Department. We are assured that “the data is seen as crucial for vital research projects”. The exchange of letters with privacy campaigners that led up to this decision can be found here, here, here, here, here, here, and here.
An early portent of this u-turn was reported here in 2014 when officials reckoned that the only way they could still do administrative tasks such as calculating doctors’ bonuses was to just pretend that the data are anonymous even though they know it isn’t really. Then, after the care.data scandal showed that a billion records had been sold to over a thousand purchasers, we reported here how HES data had also been sold and how the minister seemed to have misled parliament about this.
I will be talking about ethics of all this on Thursday. Even if ministers claim that stolen medical records are OK to use, researchers must not act as if this is true; if patients end up trusting doctors as little as we trust politicians, then medical research will be in serious trouble. There is a video of a previous version of this talk here.
Meanwhile, if you’re annoyed that Jeremy Hunt proposes to ignore not just your privacy rights but your express wishes, you can send him a notice under Section 10 of the Data Protection Act forbidding him from disclosing your data. The Department has complied with such notices in the past, albeit with bad grace as they have no automated way to do it. If thousands of people serve such notices, they may finally have to stand up to the drug company lobbyists and write the missing software. For more, see here.
At our security group meeting on the 19th August, Sergei Skorobogatov demonstrated a NAND backup attack on an iPhone 5c. I typed in six wrong PINs and it locked; he removed the flash chip (which he’d desoldered and led out to a socket); he erased and restored the changed pages; he put it back in the phone; and I was able to enter a further six wrong PINs.
I really like the simplicity of the original assumption. The starting point of the research was that different crypto/RSA libraries use slightly different elimination methods and “cut-off” thresholds to find suitable prime numbers. They thought these differences should be sufficient to detect a particular cryptographic implementation and all that was needed were public keys. Petr et al confirmed this assumption. The best paper award is a well-deserved recognition as I’ve worked with and followed Petr’s activities closely.
The authors created a method for efficient identification of the source (software library or hardware device) of RSA public keys. It resulted in a classification of keys into more than dozen categories. This classification can be used as a fingerprint that decreases the anonymity of users of Tor and other privacy enhancing mailers or operators.
All that is a result of an analysis of over 60 million freshly generated keys from 22 open- and closed-source libraries and from 16 different smart-cards. While the findings are fairly theoretical, they are demonstrated with a series of easy to understand graphs (see above).
I can’t see an easy way to exploit the results for immediate cyber attacks. However, we started looking into practical applications. There are interesting opportunities for enterprise compliance audits, as the classification only requires access to datasets of public keys – often created as a by-product of internal network vulnerability scanning.
We found that software on your smartphone can infer words you type in other apps by monitoring the aggregate number of context switches and the number of hardware interrupts. These are readable by permissionless apps within the virtual procfs filesystem (mounted under /proc). Three previousresearchgroups had found that other files under procfs support side channels. But the files they used contained information about individual apps– e.g. the file /proc/uid_stat/victimapp/tcp_snd contains the number of bytes sent by “victimapp”. These files are no longer readable in the latest Android version.
We found that the “global” files – those that contain aggregate information about the system – also leak. So a curious app can monitor these global files as a user types on the phone and try to work out the words. We looked at smartphone keyboards that support “gesture typing”: a novel input mechanism democratized by SwiftKey, whereby a user drags their finger from letter to letter to enter words.
This work shows once again how difficult it is to prevent side channels: they come up in all sorts of interesting and unexpected ways. Fortunately, we think there is an easy fix: Google should simply disable access to all procfs files, rather than just the files that leak information about individual apps. Meanwhile, if you’re developing apps for privacy or anonymity, you should be aware that these risks exist.
I am at the Privacy Enhancing Technologies Symposium (PETS 2016) in Darmstadt until Friday, and will try to liveblog some of the sessions in followups to this post. (I can’t do them all as there are some parallel sessions.)