I’m at the fourteenth workshop on the economics of information security at TU Delft. I’ll be liveblogging the sessions in followups to this post.
In late April 2010 users of the Yahoo and Microsoft IM systems started to get messages from their buddies which said, for example:
foto ☺ http://firstname.lastname@example.org
where the email address was theirs and the URL was for some malware.
Naturally, since the message was from their buddy a lot of folks clicked on the link and when the Windows warning pop-up said “you cannot see this photo until you press OK” they pressed OK and (since the Windows message was in fact a warning about executing unknown programs downloaded from the Internet) they too became infected with the malware. Hence they sent
foto ☺ messages to all their buddies and the worm spread at increasing speed.
By late May 2010 I had determined how the malware was controlled (it resolved hostnames to locate IRC servers then joined particular channels where the topic was the message to be sent to buddies) and built a Perl program to join in and monitor what was going on. I also determined that the criminals were often hosting their malware on hosting sites with world-readable Apache weblogs so we could get exact counts of malware downloads (how many people clicked on the links).
Full details, and the story of a number of related worms that spread over the next two years can be found in the academic paper (and are summarised in the slides I used for a very short talk in Barcelona and a longer version I presented a week earlier in Luxembourg).
The key results are:
- Thanks to some sloppiness by the criminals we had some brief snapshots of activity from an IRC channel used when the spreading phase was complete and infected machines were being forced to download new malware — this showed that 95% of people had clicked OK to dismiss the Microsoft warning message.
- We had sufficient download data to estimate that around 3 million users were infected by the initial worm and we have records of over 14 million distinct downloads over all of the different worms (having ignored events caused by security monitoring, multiple clicks by the same user, etc.). That is — this was a large scale event.
- We were able to compare the number of clicks during periods where the criminals vacillated between using URL shorteners in their URLs and when they used hostnames that (vaguely resembled) brands such as Facebook, MySpace, Orkut and so on. We found that when shorteners were used this reduced the number of clicks by almost half — presumably because it made users more cautious.
- From early 2011 the worms were mainly affecting Brazil — and the simple “foto ☺” had long been replaced by other textual lures. We found that when the criminals used lures in Portuguese (e.g. “eu acho que é você na”, which has, I was told in Barcelona, a distinctive Brazilian feel to it) they were far more successful in getting people to click than when they used ‘language independent’ lures such as “hahha foto”
There’s nothing here which is super-surprising, but it is useful to see our preconceptions borne out not in a laboratory experiment (where it is hard to ensure that the experimental subjects are behaving quite the way that they would ‘in the wild’) but by large scale measurements from real events.
I’m at Princeton where Ed Snowden is due to speak by live video link in a few minutes, and have a discussion with Bart Gellmann.
Yesterday he spent four hours with a group of cryptographers from industry and academia, of which I was privileged to be one. The topic was the possible and likely countermeasures, both legal and technical, against state surveillance. Ed attended as the “Snobot”, a telepresence robot that let him speak to us, listen and move round the room, from a studio in Moscow. As well as over a dozen cryptographers there was at least one lawyer and at least one journalist familiar with the leaked documents. Yesterday’s meeting was under the Chatham House rule, so I may not say who said what; any new disclosures may have been made by Snowden, or by one of the journalists, or by one of the cryptographers who has assisted journalists with the material. Although most of what was discussed has probably appeared already in one place or another, as a matter of prudence I’m publishing these notes on the blog while I’m enjoying US first-amendment rights, and will sanitise them from my laptop before coming back through UK customs.
The problem of state surveillance is a global one rather than an NSA issue, and has been growing for years, along with public awareness of it. But we learned a lot from the leaks; for example, wiretaps on the communications between data centres were something nobody thought of; and it might do no harm to think a bit more about the backhaul in CDNs. (A website that runs TLS to a CDN and then bareback to the main server is actually worse than nothing, as we lose the ability to shame them.) Of course the agencies will go for the low-hanging fruit. Second, we also got some reassurance; for example, TLS works, unless the agencies have managed to steal or coerce the private keys, or hack the end systems. (This is a complex discussion given CDNs, problems with the CA ecology and bugs like Heartbleed.) And it’s a matter of record that Ed trusted his life to Tor, because he saw from the other side that it worked.
Third, the leaks give us a clear view of an intelligence analyst’s workflow. She will mainly look in Xkeyscore which is the Google of 5eyes comint; it’s a federated system hoovering up masses of stuff not just from 5eyes own assets but from other countries where the NSA cooperates or pays for access. Data are “ingested” into a vast rolling buffer; an analyst can run a federated search, using a selector (such as an IP address) or fingerprint (something that can be matched against the traffic). There are other such systems: “Dancing oasis” is the middle eastern version. Some xkeyscore assets are actually compromised third-party systems; there are multiple cases of rooted SMS servers that are queried in place and the results exfiltrated. Others involve vast infrastructure, like Tempora. If data in Xkeyscore are marked as of interest, they’re moved to Pinwale to be memorialised for 5+ years. This is one function of the MDRs (massive data repositories, now more tactfully renamed mission data repositories) like Utah. At present storage is behind ingestion. Xkeyscore buffer times just depend on volumes and what storage they managed to install, plus what they manage to filter out.
As for crypto capabilities, a lot of stuff is decrypted automatically on ingest (e.g. using a “stolen cert”, presumably a private key obtained through hacking). Else the analyst sends the ciphertext to CES and they either decrypt it or say they can’t. There’s no evidence of a “wow” cryptanalysis; it was key theft, or an implant, or a predicted RNG or supply-chain interference. Cryptanalysis has been seen of RC4, but not of elliptic curve crypto, and there’s no sign of exploits against other commonly used algorithms. Of course, the vendors of some products have been coopted, notably skype. Homegrown crypto is routinely problematic, but properly implemented crypto keeps the agency out; gpg ciphertexts with RSA 1024 were returned as fails.
With IKE the NSA were interested in getting the original handshakes, harvesting them all systematically worldwide. These are databased and indexed. The quantum type attacks were common against non-crypto traffic; it’s easy to spam a poisoned link. However there is no evidence at all of active attacks on cryptographic protocols, or of any break-and-poison attack on crypto links. It is however possible that the hacking crew can use your cryptography to go after your end system rather than the content, if for example your crypto software has a buffer overflow.
What else might we learn from the disclosures when designing and implementing crypto? Well, read the disclosures and use your brain. Why did GCHQ bother stealing all the SIM card keys for Iceland from Gemalto, unless they have access to the local GSM radio links? Just look at the roof panels on US or UK embassies, that look like concrete but are actually transparent to RF. So when designing a protocol ask yourself whether a local listener is a serious consideration.
In addition to the Gemalto case, Belgacom is another case of hacking X to get at Y. The kind of attack here is now completely routine: you look for the HR spreadsheet in corporate email traffic, use this to identify the sysadmins, then chain your way in. Companies need to have some clue if they’re to stop attacks like this succeeding almost trivially. By routinely hacking companies of interest, the agencies are comprehensively undermining the security of critical infrastructure, and claim it’s a “nobody but us” capability. however that’s not going to last; other countries will catch up.
Would opportunistic encryption help, such as using unauthenticated Diffie-Hellman everwhere? Quite probably; but governments might then simply compel the big service forms to make the seeds predictable. At present, key theft is probably more common than key compulsion in US operations (though other countries may be different). If the US government ever does use compelled certs, it’s more likely to be the FBI than the NSA, because of the latter’s focus on foreign targets. The FBI will occasionally buy hacked servers to run in place as honeypots, but Stuxnet and Flame used stolen certs. Bear in mind that anyone outside the USA has zero rights under US law.
Is it sensible to use medium-security systems such as Skype to hide traffic, even though they will give law enforcement access? For example, an NGO contacting people in one of the Stans might not want to incriminate them by using cryptography. The problem with this is that systems like Skype will give access not just to the FBI but to all sorts of really unsavoury police forces.
FBI operations can be opaque because of the care they take with parallel construction; the Lavabit case was maybe an example. It could have been easy to steal the key, but then how would the intercepted content have been used in court? In practice, there are tons of convictions made on the basis of cargo manifests, travel plans, calendars and other such plaintext data about which a suitable story can be told. The FBI considers it to be good practice to just grab all traffic data and memorialise it forever.
The NSA is even more cautious than the FBI, and won’t use top exploits against clueful targets unless it really matters. Intelligence services are at least aware of the risk of losing a capability, unlike vanilla law enforcement, who once they have a tool will use it against absolutely everybody.
Using network intrusion detection against bad actors is very much like the attack / defence evolution seen in the anti-virus business. A system called Tutelage uses Xkeyscore infrastructure and matches network traffic against signatures, just like AV, but it has the same weaknesses. Script kiddies are easily identifiable from their script signatures via Xkeyscore, but the real bad actors know how to change network signatures, just as modern malware uses packers to become highly polymorphic.
Cooperation with companies on network intrusion detection is tied up with liability games. DDoS attacks from Iran spooked US banks, which invited the government in to snoop on their networks, but above all wanted liability protection.
Usability is critical. Lots of good crypto never got widely adopted as it was too hard to use; think of PGP. On the other hand, Tails is horrifically vulnerable to traditional endpoint attacks, but you can give it as a package to journalists to use so they won’t make so many mistakes. The source has to think “How can I protect myself?” which makes it really hard, especially for a source without a crypto and security background. You just can’t trust random journalists to be clueful about everything from scripting to airgaps. Come to think of it, a naive source shouldn’t trust their life to securedrop; he should use gpg before he sends stuff to it but he won’t figure out that it’s a good idea to suppress key IDs. Engineers who design stuff for whistleblowers and journalists must be really thoughtful and careful if they want to ensure their users won’t die when they screw up. The goal should be that no single error should be fatal, and so long as their failures aren’t compounded the users will stay alive. Bear in mind that non-roman-language countries use numeric passwords, and often just 8 digits. And being a target can really change the way you operate. For example, password managers are great, but not for someone like Ed, as they put too many of the eggs in one basket. If you’re a target, create a memory castle, or a token that can be destroyed on short notice. If you’re a target like Ed, you have to compartmentalise.
On the policy front, one of the eye-openers was the scale of intelligence sharing – it’s not just 5 eyes, but 15 or 35 or even 65 once you count all the countries sharing stuff with the NSA. So how does governance work? Quite simply, the NSA doesn’t care about policy. Their OGC has 100 lawyers whose job is to “enable the mission”; to figure out loopholes or new interpretations of the law that let stuff get done. How do you restrain this? Could you use courts in other countries, that have stronger human-rights law? The precedents are not encouraging. New Zealand’s GCSB was sharing intel with Bangladesh agencies while the NZ government was investigating them for human-rights abuses. Ramstein in Germany is involved in all the drone killings, as fibre is needed to keep latency down low enough for remote vehicle pilots. The problem is that the intelligence agencies figure out ways to shield the authorities from culpability, and this should not happen.
Jurisdiction is a big soft spot. When will CDNs get tapped on the shoulder by local law enforcement in dodgy countries? Can you lock stuff out of particular jurisdictions, so your stuff doesn’t end up in Egypt just for load-balancing reasons? Can the NSA force data to be rehomed in a friendly jurisdiction, e.g. by a light DoS? Then they “request” stuff from a partner rather than “collecting” it.
The spooks’ lawyers play games saying for example that they dumped content, but if you know IP address and file size you often have it; and IP address is a good enough pseudonym for most intel / LE use. They deny that they outsource to do legal arbitrage (e.g. NSA spies on Brits and GCHQ returns the favour by spying on Americans). Are they telling the truth? In theory there will be an MOU between NSA and the partner agency stipulating respect for each others’ laws, but there can be caveats, such as a classified version which says “this is not a binding legal document”. The sad fact is that law and legislators are losing the capability to hold people in the intelligence world to account, and also losing the appetite for it.
The deepest problem is that the system architecture that has evolved in recent years holds masses of information on many people with no intelligence value, but with vast potential for political abuse.
Traditional law enforcement worked on individualised suspicion; end-system compromise is better than mass search. Ed is on the record as leaving to the journalists all decisions about what targeted attacks to talk about, as many of them are against real bad people, and as a matter of principle we don’t want to stop targeted attacks.
Interference with crypto in academia and industry is longstanding. People who intern with a clearance get a “lifetime obligation” when they go through indoctrination (yes, that’s what it’s called), and this includes pre-publication review of anything relevant they write. The prepublication review board (PRB) at the CIA is notoriously unresponsive and you have to litigate to write a book. There are also specific programmes to recruit cryptographers, with a view to having friendly insiders in companies that might use or deploy crypto.
The export control mechanisms are also used as an early warning mechanism, to tip off the agency that kit X will be shipped to country Y on date Z. Then the technicians can insert an implant without anyone at the exporting company knowing a thing. This is usually much better than getting stuff Trojanned by the vendor.
Western governments are foolish to think they can develop NOBUS (no-one but us) technology and press the stop button when things go wrong, as this might not be true for ever. Stuxnet was highly targeted and carefully delivered but it ended up in Indonesia too. Developing countries talk of our first-mover advantage in carbon industrialisation, and push back when we ask them to burn less coal. They will make the same security arguments as our governments and use the same techniques, but without the same standards of care. Bear in mind, on the equities issue, that attack is way way easier than defence. So is cyber-war plausible? Politically no, but at the expert level it might eventually be so. Eventually something scary will happen, and then infrastructure companies will care more, but it’s doubtful that anyone will do a sufficiently coordinated attack on enough diverse plant through different firewalls and so on to pose a major threat to life.
How can we push back on the poisoning of the crypto/security community? We have to accept that some people are pro-NSA while others are pro-humanity. Some researchers do responsible disclosure while others devise zero-days and sell them to the NSA or Vupen. We can push back a bit by blocking papers from conferences or otherwise denying academic credit where researchers prefer cash or patriotism to responsible disclosure, but that only goes so far. People who can pay for a new kitchen with their first exploit sale can get very patriotic; NSA contractors have a higher standard of living than academics. It’s best to develop a culture where people with and without clearances agree that crypto must be open and robust. The FREAK attack was based on export crypto of the 1990s.
We must also strengthen post-national norms in academia, while in the software world we need transparency, not just in the sense of open source but of business relationships too. Open source makes it harder for security companies to sell different versions of the product to people we like and people we hate. And the NSA may have thought dual-EC was OK because they were so close to RSA; a sceptical purchaser should have observed how many government speakers help them out at the RSA conference!
Secret laws are pure poison; government lawyers claim authority and act on it, and we don’t know about it. Transparency about what governments can and can’t do is vital.
On the technical front, we can’t replace the existing infrastructure, so it won’t be possible in the short term to give people mobile phones that can’t be tracked. However it is possible to layer new communications systems on top of what already exists, as with the new generation of messaging apps that support end-to-end crypto with no key escrow. As for whether such systems take off on a large enough scale to make a difference, ultimately it will all be about incentives.
The FBI overstated forensic hair matches in nearly all trials up till 2000. 26 of their 28 examiners overstated forensic matches in ways that favoured prosecutors in more than 95 percent of the 268 trials reviewed so far. 32 defendants were sentenced to death, of whom 14 were executed or died in prison.
In the District of Columbia, the only jurisdiction where defenders and prosecutors have re-investigated all FBI hair convictions, three of seven defendants whose trials included flawed FBI testimony have been exonerated through DNA testing since 2009, and courts have cleared two more. All five served 20 to 30 years in prison for rape or murder. The FBI examiners in question also taught 500 to 1,000 state and local crime lab analysts to testify in the same ways.
Systematically flawed forensic evidence should be familiar enough to readers of this blog. In four previous posts here I’ve described problems with the curfew tags that are used to monitor the movements of parolees and terrorism suspects in the UK. We have also written extensively on the unreliability of card payment evidence, particularly in banking disputes. However, payment evidence can also be relevant to serious criminal trials, of which the most shocking cases are probably those described here and here. Hundreds, perhaps thousands, of men were arrested after being wrongly suspected of buying indecent images of children, when in fact they were victims of credit card fraud. Having been an expert witness in one of those cases, I wrote to the former DPP Kier Starmer on his appointment asking him to open a formal inquiry into the police failure to understand credit card fraud, and to review cases as appropriate. My letter was ignored.
The Washington Post article argues cogently that the USA lacks, and needs, a mechanism to deal with systematic failures of the justice system, particularly when these are related to its inability to cope with technology. The same holds here too. In addition to the hundreds of men wrongly arrested for child porn offences in Operation Ore, there have been over two hundred prosecutions for curfew tag tampering, no doubt with evidence similar to that offered in cases where we secured acquittals. There have been scandals in the past over DNA and fingerprints, as I describe in my book. How many more scandals are waiting to break? And as everything goes online, digital evidence will play an ever larger role, leading to more systematic failures in future. How should we try to forestall them?
I’m at the 23rd Security Protocols Workshop, whose theme this year is is information security in fiction and in fact. Engineering is often inspired by fiction, and vice versa; what might we learn from this?
I will try to liveblog the talks in followups to this post.
Today at 5pm I’ll be giving the Bellwether Lecture at the Oxford Internet Institute. My topic is Big Conflicts: the ethics and economics of privacy in a world of Big Data.
I’ll be discussing a recent Nuffield Bioethics Council report of which I was one of the authors. In it, we asked what medical ethics should look like in a world of ‘Big Data’ and pervasive genomics. It will take the law some time to catch up with what’s going on, so how should researchers behave meanwhile so that the people whose data we use don’t get annoyed or surprised, and so that we can defend our actions if challenged? We came up with four principles, which I’ll discuss. I’ll also talk about how they might apply more generally, for example to my own field of security research.
This morning I received a request to review a manuscript for the “Journal of Internet and Information Systems“. That’s standard for academics — you regularly get requests to do some work for the community for free!
However this was a little out of the ordinary in that the title of the manuscript was “THE ASSESSING CYBER CRIME AND IT IMPACT ON INFORMATION TECHNOLOGY IN NIGERIA” which is not, I feel, particularly grammatical English. I’d expect an editor to have done something about that before I was sent the manuscript…
I stared hard at the email headers (after all I’d just been sent some .docx files out of the blue) and it seems that the Journals Review Department of academicjournals.org uses Microsoft’s platform for their email (so no smoking gun from a spear-fishing point of view). So I took some appropriate precautions and opened the manuscript file.
It was dreadful … and read like it had been copied from somewhere else and patched together — indeed one page appeared twice! However, closer examination suggested it had been scanned rather than copy-typed.
The primary maturation of malicious agents attacking information system has changed over time from pride and prestige to financial again.
Which, some searches will show you comes from page 22 of Policing Cyber Crime written by Petter Gottschalk in 2010 — a book I haven’t read so I’ve no idea how good it is. Clearly “maturation” should be “motivation”, “system” should “systems” and “again” should be “gain”.
Much of the rest of the material (I didn’t spend a long time on it) was from the same source. Since the book is widely available for download in PDF format (though I do wonder how many versions were authorised), it’s pretty odd to have scanned it.
I then looked harder at the Journal itself — which is one of a group of 107 open-access journals. According to this report they were at one time misleadingly indicating an association with Elsevier, although they didn’t do that on the email they sent me.
The journals appear on “Beall’s list“: a compendium of questionable, scholarly open-access publishers and journals. That is, publishing your article in one of these venues is likely to make your CV look worse rather than better.
In traditional academic publishing the author gets their paper published for free and libraries pay (quite substantial amounts) to receive the journal, which the library users can then read for free, but the article may not be available to non-library users. The business model of “open-access” is that the author pays for having their paper published, and then it is freely available to everyone. There is now much pressure to ensure that academic work is widely available and so open-access is very much in vogue.
There are lots of entirely legitimate open-access journals with exceedingly high standards — but also some very dubious journals which are perceived of as accepting most anything and just collecting the money to keep the publisher in the style to which they have become accustomed (as an indication of the money involved, the fee charged by the Journal of Internet and Information Systems is $550).
I sent back an email to the Journal saying “Even a journal with your reputation should not accept this item“.
What does puzzle me is why anyone would submit a plagiarised article to an open-access journal with a poor reputation. Paying money to get your ripped-off material published in a dubious journal doesn’t seem to be good tactics for anyone. Perhaps it’s just that the journal wants to list me (enrolling my reputation) as one of their reviewers? Or perhaps I was spear-phished after all? Time will tell!
Today sees the publication of a report I helped to write for the Nuffield Bioethics Council on what happens to medical ethics in a world of cloud-based medical records and pervasive genomics.
As the information we gave to our doctors in private to help them treat us is now collected and treated as an industrial raw material, there has been scandal after scandal. From failures of anonymisation through unethical sales to the care.data catastrophe, things just seem to get worse. Where is it all going, and what must a medical data user do to behave ethically?
We put forward four principles. First, respect persons; do not treat their confidential data like were coal or bauxite. Second, respect established human-rights and data-protection law, rather than trying to find ways round it. Third, consult people who’ll be affected or who have morally relevant interests. And fourth, tell them what you’ve done – including errors and security breaches.
The collection, linking and use of data in biomedical research and health care: ethical issues took over a year to write. Our working group came from the medical profession, academics, insurers and drug companies. We had lots of arguments. But it taught us a lot, and we hope it will lead to a more informed debate on some very important issues. And since medicine is the canary in the mine, we hope that the privacy lessons can be of value elsewhere – from consumer data to law enforcement and human rights.
I will be trying to liveblog Financial Cryptography 2015.
The opening keynote was by Gavin Andresen, chief scientist of the Bitcoin Foundation, and his title was “What Satoshi didn’t know.” The main unknown six years ago when bitcoin launched was whether it would bootstrap; Satoshi thought it might be used as a spam filter or a practical hashcash. In reality it was someone buying a couple of pizzas for 10,000 bitcoins. Another unknown when Gavin got involved in 2010 was whether it was legal; if you’d asked the SEC then they might have classified it as a Ponzi scheme, but now their alerts are about bitcoin being used in Ponzi schemes. The third thing was how annoying people can be on the Internet; people will abuse your system for fun if it’s popular. An example was penny flooding, where you send coins back and forth between your sybils all day long. Gavin invented “proof of stake”; in its early form it meant prioritising payers who turn over coins less frequently. The idea was that scarcity plus utility equals value; in addition to the bitcoins themselves, another scarce resources emerges as the old, unspent transaction outputs (UTXOs). Perhaps these could be used for further DoS attack prevention or a pseudonymous identity anchor.
It’s not even clear that Satoshi is or was a cryptographer; he used only ECC / ECDSA, hashes and SSL (naively), he didn’t bother compressing public keys, and comments suggest he wasn’t up on the latest crypto research. In addition, the rules for letting transactions into the chain are simple; there’s no subtlety about transaction meaning, which is mixed up with validation and transaction fees; a programming-languages guru would have done things differently. Bitcoin now allows hashes of redemption scripts, so that the script doesn’t have to be disclosed upfront. Another recent innovation is using invertible Bloom lookup tables (IBLTs) to transmit expected differences rather than transmitting all transactions over the network twice. Also, since 2009 we have FHE, NIZLPs and SNARKs from the crypto research folks; the things on which we still need more research include pseudonymous identity, practical privacy, mining scalability, probabilistic transaction checking, and whether we can use streaming algorithms. In questions, Gavin remarked that regulators rather like the idea that there was a public record of all transactions; they might be more negative if it were completely anonymous. In the future, only recent transactions will be universally available; if you want the old stuff you’ll have to store it. Upgrading is hard though; Gavin’s big task this year is to increase the block size. Getting everyone in the world to update their software at once is not trivial. People say: “Why do you have to fix the software? Isn’t bitcoin done?”
I’ll try to blog the refereed talks in comments to this post.
TU Delft has just launched a massively open online course on security economics to which three current group members (Sophie van der Zee, David Modoc and I) have contributed lectures, along with one alumnus (Tyler Moore). Michel van Eeten of Delft is running the course (Delft does MOOCs while Cambridge doesn’t yet), and there are also talks from Rainer Boehme. This was pre-announced here by Tyler in November.
The videos will be available for free in April; if you want to take the course now, I’m afraid it costs $250. The deal is that EdX paid for the production and will sell it as a professional course to security managers in industry and government; once that’s happened we’ll make it free to all. This is the same basic approach as with my book: rope in a commercial publisher to help produce first-class content that then becomes free to all. But if your employer is thinking of giving you some security education, you could do a lot worse than to support the project and enrol here.