Category Archives: Security engineering

Bad security, good security, case studies, lessons learned

Talking Trojan: Analyzing an Industry-Wide Disclosure

Talking Trojan: Analyzing an Industry-Wide Disclosure tells the story of what happened after we discovered the Trojan Source vulnerability, which broke almost all computer languages, and the Bad Characters vulnerability, which broke almost all large NLP tools. This provided a unique opportunity to measure software maintenance in action. Who patched quickly, reluctantly, or not at all? Who paid bug bounties, and who dodged liability? What parts of the disclosure ecosystem work well, which are limping along, and which are broken?

Security papers typically describe a vulnerability but say little about how it was disclosed and patched. And while disclosing one vulnerability to a single vendor can be hard enough, modern supply chains multiply the number of affected parties leading to an exponential increase in the complexity of the disclosure. One vendor will want an in-house web form, another will use an outsourced bug bounty platform, still others will prefer emails, and *nix OS maintainers will use a very particular PGP mailing list. Governments sort-of want to assist with disclosures but prefer to use yet another platform. Many open-source projects lack an embargoed disclosure process, but it is often in the interest of commercial operating system maintainers to write embargoed patches – if you can get hold of the right people.

A vulnerability that affected many different products at the same time and in similar ways gave us a unique chance to observe the finite-impulse response of this whole complex system. Our observations reveal a number of weaknesses, such as a potentially dangerous misalignment of incentives between commercially sponsored bug bounty programs and multi-vendor coordinated disclosure platforms. We suggest tangible changes that could strengthen coordinated disclosure globally.

We also hope to inspire other researchers to publish the mechanics of individual disclosures, so that we can continue to measure and improve the critical ecosystem on which we rely as our main defense against growing supply chain threats. In the meantime, our paper can be found here, and will appear in SCORED ‘22 this November.

The Dynamics of Industry-wide Disclosure

Last year, we disclosed two related vulnerabilities that broke a wide range of systems. In our Bad Characters paper, we showed how to use Unicode tricks – such as homoglyphs and bidi characters – to mislead NLP systems. Our Trojan Source paper showed how similar tricks could be used to make source code look one way to a human reviewer, and another way to a compiler, opening up a wide range of supply-chain attacks on critical software. Prior to publication, we disclosed our findings to four suppliers of large NLP systems, and nineteen suppliers of software development tools. So how did industry respond?

We were invited to give the keynote talk this year at LangSec, and the video is now available. In it we describe not just the Bad Characters and Trojan Source vulnerabilities, but the large natural experiment created by their disclosure. The Trojan Source vulnerability affected most compilers, interpreters, code editors and code repositories; this enabled us to compare responses by firms versus nonprofits and by firms that managed their own response versus those who outsourced it. The interaction between bug bounty programs, government disclosure assistance, peer review and press coverage was interesting. Most of the affected development teams took action, though some required a bit of prodding.

The response by the NLP maintainers was much less enthusiastic. By the time we gave this talk, only Google had done anything – though we now hear that Microsoft is now also working on a fix. The reasons for this responsibility gap need to be understood better. They may include differences in culture between C coders and data scientists; the greater costs and delays in the build-test-deploy cycle for large ML models; and the relative lack of press interest in attacks on ML systems. If many of our critical systems start to include ML components that are less maintainable, will the ML end up being the weakest link?

Morello chip on board

Formal CHERI: rigorous engineering and design-time proof of full-scale architecture security properties

Memory safety bugs continue to be a major source of security vulnerabilities, with their root causes ingrained in the industry:

  • the C and C++ systems programming languages that do not enforce memory protection, and the huge legacy codebase written in them that we depend on;
  • the legacy design choices of hardware that provides only coarse-grain protection mechanisms, based on virtual memory; and
  • test-and-debug development methods, in which only a tiny fraction of all possible execution paths can be checked, leaving ample unexplored corners for exploitable bugs.

Over the last twelve years, the CHERI project has been working on addressing the first two of these problems by extending conventional hardware Instruction-Set Architectures (ISAs) with new architectural features to enable fine-grained memory protection and highly scalable software compartmentalisation, prototyped first as CHERI-MIPS and CHERI-RISC-V architecture designs and FPGA implementations, with an extensive software stack ported to run above them.

The academic experimental results are very promising, but achieving widespread adoption of CHERI needs an industry-scale evaluation of a high-performance silicon processor implementation and software stack. To that end, Arm have developed Morello, a CHERI-enabled prototype architecture (extending Armv8.2-A), processor (adapting the high-performance Neoverse N1 design), system-on-chip (SoC), and development board, within the UKRI Digital Security by Design (DSbD) Programme (see our earlier blog post on Morello). Morello is now being evaluated in a range of academic and industry projects.

Morello desktopMorello chip on board

However, how do we ensure that such a new architecture actually provides the security guarantees it aims to provide? This is crucial: any security flaw in the architecture will be present in any conforming hardware implementation, quite likely impossible to fix or work around after deployment.

In this blog post, we describe how we used rigorous engineering methods to provide high assurance of key security properties of CHERI architectures, with machine-checked mathematical proof, as well as to complement and support traditional design and development workflows, e.g. by automatically generating test suites. This is addressing the third problem, showing that, by judicious use of rigorous semantics at design time, we can do much better than test-and-debug development.
Continue reading Formal CHERI: rigorous engineering and design-time proof of full-scale architecture security properties

Text mining is harder than you think

Following last year’s row about Apple’s proposal to scan all the photos on your iPhone camera roll, EU Commissioner Johansson proposed a child sex abuse regulation that would compel providers of end-to-end encrypted messaging services to scan all messages in the client, and not just for historical abuse images but for new abuse images and for text messages containing evidence of grooming.

Now that journalists are distracted by the imminent downfall of our great leader, the Home Office seems to think this is a good time to propose some amendments to the Online Safety Bill that will have a similar effect. And while the EU planned to win the argument against the pedophiles first and then expand the scope to terrorist radicalisation and recruitment too, Priti Patel goes for the terrorists from day one. There’s some press coverage in the Guardian and the BBC.

We explained last year why client-side scanning is a bad idea. However, the shift of focus from historical abuse images to text scanning makes the government story even less plausible.

Detecting online wickedness from text messages alone is hard. Since 2016, we have collected over 99m messages from cybercrime forums and over 49m from extremist forums, and these corpora are used by 179 licensees in 55 groups from 42 universities in 18 countries worldwide. Detecting hate speech is a good proxy for terrorist radicalisation. In 2018, we thought we could detect hate speech with a precision of typically 92%, which would mean a false-alarm rate of 8%. But the more complex models of 2022, based on Google’s BERT, when tested on the better collections we have now, don’t do significantly better; indeed, now that we understand the problem in more detail, they often do worse. Do read that paper if you want to understand why hate-speech detection is an interesting scientific problem. With some specific kinds of hate speech it’s even harder; an example is anti-semitism, thanks to the large number of synonyms for Jewish people. So if we were to scan 10bn messages a day in Europe there would be maybe a billion false alarms for Europol to look at.

We’ve been scanning the Internet for wickedness for over fifteen years now, and looking at various kinds of filters for everything from spam to malware. Filtering requires very low false positive rates to be feasible at Internet scale, which means either looking for very specific things (such as indicators of compromise by a specific piece of malware) or by having rich metadata (such as a big spam run from some IP address space you know to be compromised). Whatever filtering Facebook can do on Messenger given its rich social context, there will be much less that a WhatsApp client can do by scanning each text on its way through.

So if you really wish to believe that either the EU’s CSA Regulation or the UK’s Online Harms Bill is an honest attempt to protect kids or catch terrorists, good luck.

Arm releases experimental CHERI-enabled Morello board as part of £187M UKRI Digital Security by Design programme

Professor Robert N. M. Watson (Cambridge), Professor Simon W. Moore (Cambridge), Professor Peter Sewell (Cambridge), Dr Jonathan Woodruff (Cambridge), Brooks Davis (SRI), and Dr Peter G. Neumann (SRI)

After over a decade of research creating the CHERI protection model, hardware, software, and formal models and proofs, developed over three DARPA research programmes, we are at a truly exciting moment. Today, Arm announced first availability of its experimental CHERI-enabled Morello processor, System-on-Chip, and development board – an industrial quality and industrial scale demonstrator of CHERI merged into a high-performance processor design. Not only does Morello fully incorporate the features described in our CHERI ISAv8 specification to provide fine-grained memory protection and scalable software compartmentalisation, but it also implements an Instruction-Set Architecture (ISA) with formally verified security properties. The Arm Morello Program is supported by the £187M UKRI Digital Security by Design (DSbD) research programme, a UK government and industry-funded effort to transition CHERI towards mainstream use.

Continue reading Arm releases experimental CHERI-enabled Morello board as part of £187M UKRI Digital Security by Design programme

Security engineering course

This week sees the start of a course on security engineering that Sam Ainsworth and I are teaching. It’s based on the third edition of my Security Engineering book, and is a first cut at a ‘film of the book’.

Each week we will put two lectures online, and here are the first two. Lecture 1 discusses our adversaries, from nation states through cyber-crooks to personal abuse, and the vulnerability life cycle that underlies the ecosystem of attacks. Lecture 2 abstracts this empirical experience into more formal threat models and security policies.

Although our course is designed for masters students and fourth-year undergrads in Edinburgh, we’re making the lectures available to everyone. I’ll link the rest of the videos in followups here, and eventually on the book’s web page.

Electhical 2021

Electhical is an industry forum whose focus is achieving a low total footprint for electronics. It is being held on Friday December 10th at Churchill College, Cambridge. The speakers are from government, industry and academia; they include executives and experts on technology policy, consumer electronics, manufacturing, security and privacy. It’s sponsored by ARM, IEEE, IEEE CAS and Churchill College; registration is free.

Rollercoaster: Communicating Efficiently and Anonymously in Large Groups

End-to-end (E2E) encryption is now widely deployed in messaging apps such as WhatsApp and Signal and billions of people around the world have the contents of their message protected against strong adversaries. However, while the message contents are encrypted, their metadata still leaks sensitive information. For example, it is easy for an infrastructure provider to tell which customers are communicating, with whom and when.

Anonymous communication hides this metadata. This is crucial for the protection of individuals such as whistleblowers who expose criminal wrongdoing, activists organising a protest, or embassies coordinating a response to a diplomatic incident. All these face powerful adversaries for whom the communication metadata alone (without knowing the specific message text) can result in harm for the individuals concerned.

Tor is a popular tool that achieves anonymous communication by forwarding messages through multiple intermediate nodes or relays. At each relay the outermost layer of the message is decrypted and the inner message is forwarded to the next relay. An adversary who wants to figure out where A’s messages are finally delivered can attempt to follow a message as it passes through each relay. Alternatively, an adversary might confirm a suspicion that user A talks to user B by observing traffic patterns at A’s and B’s access points to the network instead. If indeed A and B are talking to each other, there will be a correlation between their traffic patterns. For instance, if an adversary observes that A sends three messages and three messages arrive at B shortly afterwards, this provides some evidence that A talks to B. The adversary can increase their certainty by collecting traffic over a longer period of time.

Mix networks such as Loopix use a different design, which defends against such traffic analysis attacks by using (i) traffic shaping and (ii) more intermediate nodes, so called mix nodes. In a simple mix network, each client only sends packets of a fixed length and at predefined intervals (e.g. 1 KiB every 5 seconds). When there is no payload to send, a cover packet is crafted that is indistinguishable to the adversary from a payload packet. If there is more than one payload packet to be sent, packets are queued and sent one by one on the predefined schedule. This traffic shaping ensures that an observer cannot gain any information from observing outgoing network packets. Moreover, mix nodes typically delay each incoming message by a random amount of time before forwarding it (with the delay chosen independently for each message), making it harder for an adversary to correlate a mix node’s incoming and outgoing messages, since they are likely to be reordered. In contrast, Tor relays forward messages as soon as possible in order to minimise latency.

Mix Networks work well for pairwise communication, but we found that group communication creates a unique challenge. Such group communication encompasses both traditional chat groups (e.g. WhatsApp groups or IRC) and collaborative editing (e.g. Google Docs, calendar sync, todo lists) where updates need to be disseminated to all other participants who are viewing or editing the content. There are many scenarios where anonymity requirements meet group communication, such as coordination between activists, diplomatic correspondence between embassies, and organisation of political campaigns.

The traffic shaping of mix networks makes efficient group communication difficult. The limited rate of outgoing messages means that sequentially sending a message to each group member can take a long time. For instance, assuming that the outgoing rate is 1 message every 5 seconds, it will take more than 8 minutes to send the message to all members in a group of size 100. During this process the sender’s output queue is blocked and they cannot send any other messages.

In our paper we propose a scheme named Rollercoaster that greatly improves the latency for group communication in mix networks. The basic idea is that group members who have already received a message can help distribute it to other members of the group. Like a chain reaction, the distribution of the message gains momentum as the number of recipients grows. In an ideal execution of this scheme, the number of users who have received a message doubles with every round, leading to substantially more efficient message delivery across the group.

Rollercoaster works well because there is typically plenty of spare capacity in the network. At any given time most clients will not be actively communicating and they are therefore mostly sending cover traffic. As a result, Rollercoaster actually improves the efficiency of the network and reduces the rate of cover traffic, which in turn reduces the overall required network bandwidth. At the same time, Rollercoaster does not require any changes to the existing Mix network protocol and can benefit from the existing user base and anonymity set.

The basic idea requires more careful consideration in a realistic environment where clients are offline or do not behave faithfully. A fault-tolerant version of our Rollercoaster scheme addresses these concerns by waiting for acknowledgement messages from recipients. If those acknowledgement messages are not received by the sender in a fixed period of time, forwarding roles are reassigned and another delivery attempt is made via a new route. We also show how a single number can seed the generation of a deterministic forwarding schedule. This allows efficient communication of different forwarding schedules and balances individual workloads within the group.

We presented our paper at USENIX Security ‘21 (paper, slides, and recording). It contains more extensions and optimisations than we can summarise here. There is also an extended version available as a tech report with more detailed security arguments in the appendices. The paper reference is:
Daniel Hugenroth, Martin Kleppmann, and Alastair R. Beresford. Rollercoaster: An Efficient Group-Multicast Scheme for Mix Networks. Proceedings of the 30th USENIX Security Symposium (USENIX Security), 2021.

Trojan Source: Invisible Vulnerabilities

Today we are releasing Trojan Source: Invisible Vulnerabilities, a paper describing cool new tricks for crafting targeted vulnerabilities that are invisible to human code reviewers.

Until now, an adversary wanting to smuggle a vulnerability into software could try inserting an unobtrusive bug in an obscure piece of code. Critical open-source projects such as operating systems depend on human review of all new code to detect malicious contributions by volunteers. So how might wicked code evade human eyes?

We have discovered ways of manipulating the encoding of source code files so that human viewers and compilers see different logic. One particularly pernicious method uses Unicode directionality override characters to display code as an anagram of its true logic. We’ve verified that this attack works against C, C++, C#, JavaScript, Java, Rust, Go, and Python, and suspect that it will work against most other modern languages.

This potentially devastating attack is tracked as CVE-2021-42574, while a related attack that uses homoglyphs – visually similar characters – is tracked as CVE-2021-42694. This work has been under embargo for a 99-day period, giving time for a major coordinated disclosure effort in which many compilers, interpreters, code editors, and repositories have implemented defenses.

This attack was inspired by our recent work on Imperceptible Perturbations, where we use directionality overrides, homoglyphs, and other Unicode features to break the text-based machine learning systems used for toxic content filtering, machine translation, and many other NLP tasks.

More information about the Trojan Source attack can be found at trojansource.codes, and proofs of concept can also be found on GitHub. The full paper can be found here.

Is Apple’s NeuralMatch searching for abuse, or for people?

Apple stunned the tech industry on Thursday by announcing that the next version of iOS and macOS will contain a neural network to scan photos for sex abuse. Each photo will get an encrypted ‘safety voucher’ saying whether or not it’s suspect, and if more than about ten suspect photos are backed up to iCloud, then a clever cryptographic scheme will unlock the keys used to encrypt them. Apple staff or contractors can then look at the suspect photos and report them.

We’re told that the neural network was trained on 200,000 images of child sex abuse provided by the US National Center for Missing and Exploited Children. Neural networks are good at spotting images “similar” to those in their training set, and people unfamiliar with machine learning may assume that Apple’s network will recognise criminal acts. The police might even be happy if it recognises a sofa on which a number of acts took place. (You might be less happy, if you own a similar sofa.) Then again, it might learn to recognise naked children, and flag up a snap of your three-year-old child on the beach. So what the new software in your iPhone actually recognises is really important.

Now the neural network described in Apple’s documentation appears very similar to the networks used in face recognition (hat tip to Nicko van Someren for spotting this). So it seems a fair bet that the new software will recognise people whose faces appear in the abuse dataset on which it was trained.

So what will happen when someone’s iPhone flags ten pictures as suspect, and the Apple contractor who looks at them sees an adult with their clothes on? There’s a real chance that they’re either a criminal or a witness, so they’ll have to be reported to the police. In the case of a survivor who was victimised ten or twenty years ago, and whose pictures still circulate in the underground, this could mean traumatic secondary victimisation. It might even be their twin sibling, or a genuine false positive in the form of someone who just looks very much like them. What processes will Apple use to manage this? Not all US police forces are known for their sensitivity, particularly towards minority suspects.

But that’s just the beginning. Apple’s algorithm, NeuralMatch, stores a fingerprint of each image in its training set as a short string called a NeuralHash, so new pictures can easily be added to the list. Once the tech is built into your iPhone, your MacBook and your Apple Watch, and can scan billions of photos a day, there will be pressure to use it for other purposes. The other part of NCMEC’s mission is missing children. Can Apple resist demands to help find runaways? Could Tim Cook possibly be so cold-hearted as to refuse at add Madeleine McCann to the watch list?

After that, your guess is as good as mine. Depending on where you are, you might find your photos scanned for dissidents, religious leaders or the FBI’s most wanted. It also reminds me of the Rasterfahndung in 1970s Germany – the dragnet search of all digital data in the country for clues to the Baader-Meinhof gang. Only now it can be done at scale, and not just for the most serious crimes either.

Finally, there’s adversarial machine learning. Neural networks are fairly easy to fool in that an adversary can tweak images so they’re misclassified. Expect to see pictures of cats (and of Tim Cook) that get flagged as abuse, and gangs finding ways to get real abuse past the system. Apple’s new tech may end up being a distributed person-search machine, rather than a sex-abuse prevention machine.

Such a technology requires public scrutiny, and as the possession of child sex abuse images is a strict-liability offence, academics cannot work with them. While the crooks will dig out NeuralMatch from their devices and play with it, we cannot. It is possible in theory for Apple to get NeuralMatch to ignore faces; for example, it could blur all the faces in the training data, as Google does for photos in Street View. But they haven’t claimed they did that, and if they did, how could we check? Apple should therefore publish full details of NeuralMatch plus a set of NeuralHash values trained on a public dataset with which we can legally work. It also needs to explain how the system it deploys was tuned and tested; and how dragnet searches of people’s photo libraries will be restricted to those conducted by court order so that they are proportionate, necessary and in accordance with the law. If that cannot be done, the technology must be abandoned.