Ethical issues in research using datasets of illicit origin

On Friday at IMC I presented our paper “Ethical issues in research using datasets of illicit origin” by Daniel R. Thomas, Sergio Pastrana, Alice Hutchings, Richard Clayton, and Alastair R. Beresford. We conducted this research after thinking about some of these issues in the context of our previous work on UDP reflection DDoS attacks.

Data of illicit origin is data obtained by illicit means such as exploiting a vulnerability or unauthorized disclosure, in our previous work this was leaked databases from booter services. We analysed existing guidance on ethics and papers that used data of illicit origin to see what issues researchers are encouraged to discuss and what issues they did discuss. We find wide variation in current practice. We encourage researchers using data of illicit origin to include an ethics section in their paper: to explain why the work was ethical so that the research community can learn from the work. At present in many cases positive benefits as well as potential harms of research, remain entirely unidentified. Few papers record explicit Research Ethics Board (REB) (aka IRB/Ethics Commitee) approval for the activity that is described and the justifications given for exemption from REB approval suggest deficiencies in the REB process. It is also important to focus on the “human participants” of research rather than the narrower “human subjects” definition as not all the humans that might be harmed by research are its direct subjects.

The paper and the slides are available.

Is this research ethical?

The Economist features face recognition on its front page, reporting that deep neural networks can now tell whether you’re straight or gay better than humans can just by looking at your face. The research they cite is a preprint, available here.

Its authors Kosinski and Wang downloaded thousands of photos from a dating site, ran them through a standard feature-extraction program, then classified gay vs straight using a standard statistical classifier, which they found could tell the men seeking men from the men seeking women. My students pretty well instantly called this out as selection bias; if gay men consider boyish faces to be cuter, then they will upload their most boyish photo. The paper authors suggest their finding may support a theory that sexuality is influenced by fetal testosterone levels, but when you don’t control for such biases your results may say more about social norms than about phenotypes.

Quite apart from the scientific value of the research, which is perhaps best assessed by specialists, I’m concerned with the ethics and privacy aspects. I am surprised that the paper doesn’t report having been through ethical review; the authors consider that photos on a dating website are public information and appear to assume that privacy issues simply do not arise.

Yet UK courts decided, in Campbell v Mirror, that privacy could be violated even by photos taken on the public street, and European courts have come to similar conclusions in I v Finland and elsewhere. For example, a Catholic woman is entitled to object to the use of her medical record in research on abortifacients and contraceptives even if the proposed use is fully anonymised and presents no privacy risk whatsoever. The dating site users would be similarly entitled to object to their photos being used in research to which they might have an ethical objection, even if they could not be identified from their photos. There are surely going to be people who object to research in any nature vs nurture debate, especially on a charged topic such as sexuality. And the whole point of the Economist’s coverage is that face-recognition technology is now good enough to work at population scale.

What do LBT readers think?

Is the City force corrupt, or just clueless?

This week brought an announcement from a banking association that “identity fraud” is soaring to new levels, with 89,000 cases reported in the first six months of 2017 and 56% of all fraud reported by its members now classed as “identity fraud”.

So what is “identity fraud”? The announcement helpfully clarifies the concept:

“The vast majority of identity fraud happens when a fraudster pretends to be an innocent individual to buy a product or take out a loan in their name. Often victims do not even realise that they have been targeted until a bill arrives for something they did not buy or they experience problems with their credit rating. To carry out this kind of fraud successfully, fraudsters need access to their victim’s personal information such as name, date of birth, address, their bank and who they hold accounts with. Fraudsters get hold of this in a variety of ways, from stealing mail through to hacking; obtaining data on the ‘dark web’; exploiting personal information on social media, or though ‘social engineering’ where innocent parties are persuaded to give up personal information to someone pretending to be from their bank, the police or a trusted retailer.”

Now back when I worked in banking, if someone went to Barclays, pretended to be me, borrowed £10,000 and legged it, that was “impersonation”, and it was the bank’s money that had been stolen, not my identity. How did things change?

The members of this association are banks and credit card issuers. In their narrative, those impersonated are treated as targets, when the targets are actually those banks on whom the impersonation is practised. This is a precursor to refusing bank customers a “remedy” for “their loss” because “they failed to protect themselves.”
Now “dishonestly making a false representation” is an offence under s2 Fraud Act 2006. Yet what is the police response?

The Head of the City of London Police’s Economic Crime Directorate does not see the banks’ narrative as dishonest. Instead he goes along with it: “It has become normal for people to publish personal details about themselves on social media and on other online platforms which makes it easier than ever for a fraudster to steal someone’s identity.” He continues: “Be careful who you give your information to, always consider whether it is necessary to part with those details.” This is reinforced with a link to a police website with supposedly scary statistics: 55% of people use open public wifi and 40% of people don’t have antivirus software (like many security researchers, I’m guilty on both counts). This police website has a quote from the Head’s own boss, a Commander who is the National Police Coordinator for Economic Crime.

How are we to rate their conduct? Given that the costs of the City force’s Dedicated Card and Payment Crime Unit are borne by the banks, perhaps they feel obliged to sing from the banks’ hymn sheet. Just as the MacPherson report criticised the Met for being institutionally racist, we might perhaps describe the City force as institutionally corrupt. There is a wide literature on regulatory capture, and many other examples of regulators keen to do the banks’ bidding. And it’s not just the City force. There are disgraceful examples of the Metropolitan Police Commissioner and GCHQ endorsing the banks’ false narrative. However people are starting to notice, including the National Audit Office.

Or perhaps the police are just clueless?

History of the Crypto Wars in Britain

Back in March I gave an invited talk to the Cambridge University Ethics in Mathematics Society on the Crypto Wars. They have just put the video online here.

We spent much of the 1990s pushing back against attempts by the intelligence agencies to seize control of cryptography. From the Clipper Chip through the regulation of trusted third parties to export control, the agencies tried one trick after another to make us all less secure online, claiming that thanks to cryptography the world of intelligence was “going dark”. Quite the opposite was true; with communications moving online, with people starting to carry mobile phones everywhere, and with our communications and traffic data mostly handled by big firms who respond to warrants, law enforcement has never had it so good. Twenty years ago it cost over a thousand pounds a day to follow a suspect around, and weeks of work to map his contacts; Ed Snowden told us how nowadays an officer can get your location history with one click and your address book with another. In fact, searches through the contact patterns of whole populations are now routine.

The checks and balances that we thought had been built in to the RIP Act in 2000 after all our lobbying during the 1990s turned out to be ineffective. GCHQ simply broke the law and, after Snowden exposed them, Parliament passed the IP Act to declare that what they did was all right now. The Act allows the Home Secretary to give secret orders to tech companies to do anything they physically can to facilitate surveillance, thereby delighting our foreign competitors. And Brexit means the government thinks it can ignore the European Court of Justice, which has already ruled against some of the Act’s provisions. (Or perhaps Theresa May chose a hard Brexit because she doesn’t want the pesky court in the way.)

Yet we now see the Home Secretary repeating the old nonsense about decent people not needing privacy along with law enforcement officials on both sides of the Atlantic. Why doesn’t she just sign the technical capability notices she deems necessary and serve them?

In these fraught times it might be useful to recall how we got here. My talk to the Ethics in Mathematics Society was a personal memoir; there are many links on my web page to relevant documents.

Compartmentation is hard, but the Big Data playbook makes it harder still

A new study of Palantir’s systems and business methods makes sobering reading for people interested in what big data means for privacy.

Privacy scales badly. It’s OK for the twenty staff at a medical practice to have access to the records of the ten thousand patients registered there, but when you build a centralised system that lets every doctor and nurse in the country see every patient’s record, things go wrong. There are even sharper concerns in the world of intelligence, which agencies try to manage using compartmentation: really sensitive information is often put in a compartment that’s restricted to a handful of staff. But such systems are hard to build and maintain. Readers of my book chapter on the subject will recall that while US Naval Intelligence struggled to manage millions of compartments, the CIA let more of their staff see more stuff – whereupon Aldrich Ames betrayed their agents to the Russians.

After 9/11, the intelligence community moved towards the CIA model, in the hope that with fewer compartments they’d be better able to prevent future attacks. We predicted trouble, and Snowden duly came along. As for civilian agencies such as Britain’s NHS and police, no serious effort was made to protect personal privacy by compartmentation, with multiple consequences.

Palantir’s systems were developed to help the intelligence community link, fuse and visualise data from multiple sources, and are now sold to police forces too. It should surprise no-one to learn that they do not compartment information properly, whether within a single force or even between forces. The organised crime squad’s secret informants can thus become visible to traffic cops, and even to cops in other forces, with tragically predictable consequences. Fixing this is hard, as Palantir’s market advantage comes from network effects and the resulting scale. The more police forces they sign up the more data they have, and the larger they grow the more third-party databases they integrate, leaving private-sector competitors even further behind.

This much we could have predicted from first principles but the details of how Palantir operates, and what police forces dislike about it, are worth studying.

What might be the appropriate public-policy response? Well, the best analysis of competition policy in the presence of network effects is probably Lina Khan’s, and her analysis would suggest in this case that police intelligence should be a regulated utility. We should develop those capabilities that are actually needed, and the right place for them is the Police National Database. The public sector is better placed to commit the engineering effort to do compartmentation properly, both there and in other applications where it’s needed, such as the NHS. Good engineering is expensive – but as the Los Angeles Police Department found, engaging Palantir can be more expensive still.

Cambridge2Cambridge 2017

Following on from various other similar events we organised over the past few years, last week we hosted our largest ethical hacking competition yet, Cambridge2Cambridge 2017, with over 100 students from some of the best universities in the US and UK working together over three days. Cambridge2Cambridge was founded jointly by MIT CSAIL (in Cambridge Massachusetts) and the University of Cambridge Computer Laboratory (in the original Cambridge) and was first run at MIT in 2016 as a competition involving only students from these two universities. This year it was hosted in Cambridge UK and we broadened the participation to many more universities in the two countries. We hope in the future to broaden participation to more countries as well.

Cambridge 2 Cambridge 2017 from Frank Stajano Explains on Vimeo.

We assigned the competitors to teams that were mixed in terms of both provenance and experience. Each team had competitors from US and UK, and no two people from the same university; and each team also mixed experienced and less experienced players, based on the qualifier scores. We did so to ensure that even those who only started learning about ethical hacking when they heard about this competition would have an equal chance of being in the team that wins the gold. We then also mixed provenance to ensure that, during these three days, students collaborated with people they didn’t already know.

Despite their different backgrounds, what the attendees had in common was that they were all pretty smart and had an interest in cyber security. It’s a safe bet that, ten or twenty years from now, a number of them will probably be Security Specialists, Licensed Ethical Hackers, Chief Security Officers, National Security Advisors or other high calibre security professionals. When their institution or country is under attack, they will be able to get in touch with the other smart people they met here in Cambridge in 2017, and they’ll be in a position to help each other. That’s why the defining feature of the event was collaboration, making new friends and having fun together. Unlike your standard one-day hacking contest, the ambitious three-day programme of C2C 2017 allowed for social activities including punting on the river Cam, pub crawling and a Harry Potter style gala dinner in Trinity College.

In between competition sessions we had a lively and inspirational “women in cyber” panel, another panel on “securing the future digital society”, one on “real world pentesting” and a careers advice session. On the second day we hosted several groups of bright teenagers who had been finalists in the national CyberFirst Girls Competition. We hope to inspire many more women to take up a career path that has so far been very male-dominated. More broadly, we wish to inspire many young kids, girls or boys, to engage in the thrilling challenge of unravelling how computers work (and how they fail to work) in a high-stakes mental chess game of adversarial attack and defense.

Our platinum sponsors Leidos and NCC Group endowed the competition with over £20,000 of cash prizes, awarded to the best 3 teams and the best 3 individuals. Besides the main attack-defense CTF, fought on the Leidos CyberNEXS cyber range, our other sponsors offered additional competitions, the results of which were combined to generate the overall teams and individual scores. Here is the leaderboard, showing how our contestants performed. Special congratulations to Bo Robert Xiao of Carnegie Mellon University who, besides winning first place in both team and individuals, also went on to win at DEF CON in team PPP a couple of days later.

We are grateful to our supporters, our sponsors, our panelists, our guests, our staff and, above all, our 110 competitors for making this event a success. It was particularly pleasing to see several students who had already taken part in some of our previous competitions (special mention for Luke Granger-Brown from Imperial who earned medals at every visit). Chase Lucas from Dakota State University, having passed the qualifier but not having picked in the initial random selection, was on the reserve list in case we got funding to fly additional students; he then promptly offered to pay for his own airfare in order to be able to attend! Inter-ACE 2017 winner Io Swift Wolf from Southampton deserted her own graduation ceremony in order to participate in C2C (!), and then donated precious time during the competition to the CyberFirst girls who listened to her rapturously. Accumulating all that good karma could not go unrewarded, and indeed you can once again find her name in the leaderboard above. And I’ve only singled out a few, out of many amazing, dynamic and enthusiastic young people. Watch out for them: they are the ones who will defend the future digital society, including you and your family, from the cyber attacks we keep reading about in the media. We need many more like them, and we need to put them in touch with each other. The bad guys are organised, so we have to be organised too.

The event was covered by Sky News, ITV, BBC World Service and a variety of other media, which the official website and twitter page will undoubtedly collect in due course.

AlphaBay and Hansa Market takedowns

Yesterday the FBI announced the takedown of the AlphaBay marketplace, a hidden service facilitating the sale of drugs, as well as other illicit products and services. The takedown had actually occurred weeks earlier, and had been staged to appear like an exit scam, where the operators take off with the money.

What was particularly interesting about the FBI’s takedown was that it was coordinated with the activities of the Dutch police, who had previously taken over the Hansa Market, another leading blackmarket. As the investigators were then controlling this marketplace they were able to monitor the activities of traders who had been using AlphaBay and then moved to Hansa Market.

I’ve been interested in online blackmarkets for some time, particularly those that relate to the stolen data economy. In fact, last year a paper written by Professor Thomas Holt and I was published. This paper outlines a number of intervention approaches, including disrupting the actual marketplaces where trade takes place.

Among our numerous suggestions are three that have been used, in combination, by this international police effort. We suggest that law enforcement promote distrust, which they did by making AlphaBay appear to have been an exit scam. We also suggest that law enforcement take over and take down marketplaces. Neither of these police approaches are new, and we point to previous examples where this has happened. In our conclusion, we stated:

Multiple interventions coordinated across different guardians, nationally and internationally, incorporating different bodies (investigative, regulatory, strategic, non-government organisations and the private sector) that have ownership of the crime prevention problem may reduce duplication of effort, as well as provide a more systematic approach with the greatest disruption effect.

The Hansa Market and AlphaBay approach demonstrates how this can be achieved. By co-ordinating the approaches, and working together, the disruptive effects of their work is likely to be much greater than if they had acted alone. It’s likely we’ll see arrests of traders and further disruption to the online drug trade.

Work by Soska and Christin found that after the Silk Road takedown, more online blackmarkets emerged and evolved. I think this evolution will continue, but perhaps marketplace administrators will have to work harder in order to earn the trust of their users.

Testing the usability of offline mobile payments

Last September we spent some time in Nairobi figuring out whether we could make offline phone payments usable. Phone payments have greatly improved the lives of millions of poor people in countries like Kenya and Bangladesh, who previously didn’t have bank accounts at all but who can now send and receive money using their phones. That’s great for the 80% who have mobile phone coverage, but what about the others?

Last year I described how we designed and built a prototype system to support offline payments, with the help of a grant from the Bill and Melinda Gates Foundation, and took it to Africa to test it. Offline payments require both the sender and the receiver to enter some extra digits to ensure that the payer and the payee agree on who’s paying whom how much. We worked as hard as we could to minimise the number of digits and to integrate them into the familar transaction flow. Would this be good enough?

Our paper setting out the results was accepted to the Symposium on Usable Privacy and Security (SOUPS), the leading security usability event. This has now started and the paper’s online; the lead author, Khaled Baqer, will be presenting it tomorrow. As we noted last year, the DigiTally pilot was a success. For the data and the detailed analysis, please see our paper:

DigiTally: Piloting Offline Payments for Phones, Khaled Baqer, Ross Anderson, Jeunese Adrienne Payne, Lorna Mutegi, Joseph Sevilla, 13th Symposium on Usable Privacy & Security (SOUPS 2017), pp 131–143