Report: Assessing the Viability of an Open-Source CHERI Desktop Software Ecosystem

CHERI (Capability Hardware Enhanced RISC Instructions) is an architectural extension to processor Instruction-Set Architectures (ISAs) adding efficient support for fine-grained C/C++-language memory protection as well as scalable software compartmentalisation. Developed over the last 11 years at SRI International and the University of Cambridge, CHERI is now the subject of a £187M UK Industrial Strategy Challenge Fund (ISCF) transition initiative, which is developing the experimental CHERI-enabled Arm Morello processor (shipping in 2022). In early 2021, UKRI funded a pilot study at Capabilities Limited (a Lab spinout led by Ben Laurie and I) to explore potential uses of CHERI and Morello as the foundation for a more secure desktop computer system. CHERI use case studies to date have focused on server and mobile scenarios, but desktop system security is essential as well, as it is frequently targeted in malware attacks (including ransomware) that also depend on plentiful software vulnerabilities. For this project, we were joined by Alex Richardson (previously a Senior Research Software Engineer at Cambridge, and now at Google), who led much of the development work described here.

In September 2021, we released our final report, Assessing the Viability of an Open-Source CHERI Desktop Software Ecosystem, which describes our three-staff-month effort to deploy CHERI within a substantive slice of an open-source desktop environment based on X11, Qt (and supporting libraries), and KDE. We adapted the software stack to run with memory-safe CHERI C/C++, performed a set of software compartmentalisation white boarding experiments, and concluded with a detailed 5-year retrospective vulnerability analysis to explore how memory safety and compartmentalisation would have affected past critical security vulnerabilities for a subset of that.

A key metric for us was ‘vulnerability mitigation’: 73.8% of past security advisories and patches (and a somewhat higher proportion of CVEs) would have been substantially mitigated by deploying CHERI. This number is not dissimilar to the Microsoft Security Response Center (MSRC)’s estimate that CHERI would have deterministically mitigated at least 67% of Microsoft’s 2019 critical memory-safety security vulnerabilities, although there were important differences in methodology (e.g., we also considered the impact of compartmentalisation on non-memory-safety vulnerabilities). One challenge in this area of the work was in establish de facto threat models for various open-source packages, as few open source vendors provide concrete definition of which bugs might (or might) constitute vulnerabilities. We had to reconstruct a threat model for each project in order to assess whether we could consider a vulnerability mitigated or not.

At low levels of the stack (e.g., 90% of X11 vulnerabilities, and 100% of vulnerabilities in supporting libraries such as giflib), vulnerabilities were almost entirely memory-safety issues, with very high mitigation rates using CHERI C/C++. At higher levels of the stack improved software compartmentalisation (e.g., enabling more fine-grained sandboxing at acceptable overheads) impacted many KDE-level vulnerabilities (e.g., 82% of Qt security notices, and 43% of KDE security advisories). Of particular interest to us was the extent to which it was important to deploy both CHERI-based protection techniques: while memory protection prevents arbitrary code execution in the vast majority of affected cases, the potential outcome of software crashing then required better compartmentalisation (e.g., of image-processing libraries) to mitigate potential denial of service. Of course, some vulnerabilities, especially at higher levels of the stack, were out of scope for our architectural approach — e.g., if an application fails to encrypt an email despite the user indicating via the UI that they require encryption, we have little to say about it.

Compatibility is also an important consideration in contemplating CHERI deployment: We estimated that we had to modify 0.026% LoC relative a 6-million line C and C++ source code base to run the stack with CHERI C/C++ memory safety. This figure compares favourably with %LoC modification requirements we have published relating to operating-system changes (e.g., in our 2019 paper on CheriABI), and a number of factors contribute to that. Not least, we have substantially improved the compatibility properties of CHERI C/C++ over the last few years through improved language and compiler support — for example, our compiler can now better resolve provenance ambiguity for intptr_t expressions through static analysis (CHERI requires that all pointers have a single source of provenance), rather than requiring source-level annotation. Another is that these higher-level application layers typically had less use of assembly code, fewer custom memory allocators and linkers, and, more generally, less architectural awareness. Along the way we also made minor improvements to CHERI LLVM’s reporting of specific types of potential compatibility problems that might require changes, as well as introducing a new CHERI LLVM sanitiser to assist with potential problems requiring dynamic detection.

The study is subject to various limitations (explored in detail in the report), not least that we worked with a subset of a much larger stack due to the three-month project length, and that our ability to assess whether the stack was working properly was limited by the available test suites and our ability to exercise applications ourselves. Further, with the Arm Morello board becoming available next year, we have not yet been able to assess the performance impact of these changes, which are another key consideration in considering deployment of CHERI in this environment. All of our results should be reproducible using the open-source QEMU-CHERI emulator and cheribuild build system. We look forward to continuing this work once shipping Arm hardware is available in the spring!

Rollercoaster: Communicating Efficiently and Anonymously in Large Groups

End-to-end (E2E) encryption is now widely deployed in messaging apps such as WhatsApp and Signal and billions of people around the world have the contents of their message protected against strong adversaries. However, while the message contents are encrypted, their metadata still leaks sensitive information. For example, it is easy for an infrastructure provider to tell which customers are communicating, with whom and when.

Anonymous communication hides this metadata. This is crucial for the protection of individuals such as whistleblowers who expose criminal wrongdoing, activists organising a protest, or embassies coordinating a response to a diplomatic incident. All these face powerful adversaries for whom the communication metadata alone (without knowing the specific message text) can result in harm for the individuals concerned.

Tor is a popular tool that achieves anonymous communication by forwarding messages through multiple intermediate nodes or relays. At each relay the outermost layer of the message is decrypted and the inner message is forwarded to the next relay. An adversary who wants to figure out where A’s messages are finally delivered can attempt to follow a message as it passes through each relay. Alternatively, an adversary might confirm a suspicion that user A talks to user B by observing traffic patterns at A’s and B’s access points to the network instead. If indeed A and B are talking to each other, there will be a correlation between their traffic patterns. For instance, if an adversary observes that A sends three messages and three messages arrive at B shortly afterwards, this provides some evidence that A talks to B. The adversary can increase their certainty by collecting traffic over a longer period of time.

Mix networks such as Loopix use a different design, which defends against such traffic analysis attacks by using (i) traffic shaping and (ii) more intermediate nodes, so called mix nodes. In a simple mix network, each client only sends packets of a fixed length and at predefined intervals (e.g. 1 KiB every 5 seconds). When there is no payload to send, a cover packet is crafted that is indistinguishable to the adversary from a payload packet. If there is more than one payload packet to be sent, packets are queued and sent one by one on the predefined schedule. This traffic shaping ensures that an observer cannot gain any information from observing outgoing network packets. Moreover, mix nodes typically delay each incoming message by a random amount of time before forwarding it (with the delay chosen independently for each message), making it harder for an adversary to correlate a mix node’s incoming and outgoing messages, since they are likely to be reordered. In contrast, Tor relays forward messages as soon as possible in order to minimise latency.

Mix Networks work well for pairwise communication, but we found that group communication creates a unique challenge. Such group communication encompasses both traditional chat groups (e.g. WhatsApp groups or IRC) and collaborative editing (e.g. Google Docs, calendar sync, todo lists) where updates need to be disseminated to all other participants who are viewing or editing the content. There are many scenarios where anonymity requirements meet group communication, such as coordination between activists, diplomatic correspondence between embassies, and organisation of political campaigns.

The traffic shaping of mix networks makes efficient group communication difficult. The limited rate of outgoing messages means that sequentially sending a message to each group member can take a long time. For instance, assuming that the outgoing rate is 1 message every 5 seconds, it will take more than 8 minutes to send the message to all members in a group of size 100. During this process the sender’s output queue is blocked and they cannot send any other messages.

In our paper we propose a scheme named Rollercoaster that greatly improves the latency for group communication in mix networks. The basic idea is that group members who have already received a message can help distribute it to other members of the group. Like a chain reaction, the distribution of the message gains momentum as the number of recipients grows. In an ideal execution of this scheme, the number of users who have received a message doubles with every round, leading to substantially more efficient message delivery across the group.

Rollercoaster works well because there is typically plenty of spare capacity in the network. At any given time most clients will not be actively communicating and they are therefore mostly sending cover traffic. As a result, Rollercoaster actually improves the efficiency of the network and reduces the rate of cover traffic, which in turn reduces the overall required network bandwidth. At the same time, Rollercoaster does not require any changes to the existing Mix network protocol and can benefit from the existing user base and anonymity set.

The basic idea requires more careful consideration in a realistic environment where clients are offline or do not behave faithfully. A fault-tolerant version of our Rollercoaster scheme addresses these concerns by waiting for acknowledgement messages from recipients. If those acknowledgement messages are not received by the sender in a fixed period of time, forwarding roles are reassigned and another delivery attempt is made via a new route. We also show how a single number can seed the generation of a deterministic forwarding schedule. This allows efficient communication of different forwarding schedules and balances individual workloads within the group.

We presented our paper at USENIX Security ‘21 (paper, slides, and recording). It contains more extensions and optimisations than we can summarise here. There is also an extended version available as a tech report with more detailed security arguments in the appendices. The paper reference is:
Daniel Hugenroth, Martin Kleppmann, and Alastair R. Beresford. Rollercoaster: An Efficient Group-Multicast Scheme for Mix Networks. Proceedings of the 30th USENIX Security Symposium (USENIX Security), 2021.

Trojan Source: Invisible Vulnerabilities

Today we are releasing Trojan Source: Invisible Vulnerabilities, a paper describing cool new tricks for crafting targeted vulnerabilities that are invisible to human code reviewers.

Until now, an adversary wanting to smuggle a vulnerability into software could try inserting an unobtrusive bug in an obscure piece of code. Critical open-source projects such as operating systems depend on human review of all new code to detect malicious contributions by volunteers. So how might wicked code evade human eyes?

We have discovered ways of manipulating the encoding of source code files so that human viewers and compilers see different logic. One particularly pernicious method uses Unicode directionality override characters to display code as an anagram of its true logic. We’ve verified that this attack works against C, C++, C#, JavaScript, Java, Rust, Go, and Python, and suspect that it will work against most other modern languages.

This potentially devastating attack is tracked as CVE-2021-42574, while a related attack that uses homoglyphs – visually similar characters – is tracked as CVE-2021-42694. This work has been under embargo for a 99-day period, giving time for a major coordinated disclosure effort in which many compilers, interpreters, code editors, and repositories have implemented defenses.

This attack was inspired by our recent work on Imperceptible Perturbations, where we use directionality overrides, homoglyphs, and other Unicode features to break the text-based machine learning systems used for toxic content filtering, machine translation, and many other NLP tasks.

More information about the Trojan Source attack can be found at trojansource.codes, and proofs of concept can also be found on GitHub. The full paper can be found here.

Bugs in our pockets?

In August, Apple announced a system to check all our iPhones for illegal images, then delayed its launch after widespread pushback. Yet some governments continue to press for just such a surveillance system, and the EU is due to announce a new child protection law at the start of December.

Now, in Bugs in our Pockets: The Risks of Client-Side Scanning, colleagues and I take a long hard look at the options for mass surveillance via software embedded in people’s devices, as opposed to the current practice of monitoring our communications. Client-side scanning, as the agencies’ new wet dream is called, has a range of possible missions. While Apple and the FBI talked about finding still images of sex abuse, the EU was talking last year about videos and text too, and of targeting terrorism once the argument had been won on child protection. It can also use a number of possible technologies; in addition to the perceptual hash functions in the Apple proposal, there’s talk of machine-learning models. And, as a leaked EU internal report made clear, the preferred outcome for governments may be a mix of client-side and server-side scanning.

In our report, we provide a detailed analysis of scanning capabilities at both the client and the server, the trade-offs between false positives and false negatives, and the side effects – such as the ways in which adding scanning systems to citizens’ devices will open them up to new types of attack.

We did not set out to praise Apple’s proposal, but we ended up concluding that it was probably about the best that could be done. Even so, it did not come close to providing a system that a rational person might consider trustworthy.

Even if the engineering on the phone were perfect, a scanner brings within the user’s trust perimeter all those involved in targeting it – in deciding which photos go on the naughty list, or how to train any machine-learning models that riffle through your texts or watch your videos. Even if it starts out trained on images of child abuse that all agree are illegal, it’s easy for both insiders and outsiders to manipulate images to create both false negatives and false positives. The more we look at the detail, the less attractive such a system becomes. The measures required to limit the obvious abuses so constrain the design space that you end up with something that could not be very effective as a policing tool; and if the European institutions were to mandate its use – and there have already been some legislative skirmishes – they would open up their citizens to quite a range of avoidable harms. And that’s before you stop to remember that the European Court of Justice struck down the Data Retention Directive on the grounds that such bulk surveillance, without warrant or suspicion, was a grossly disproportionate infringement on privacy, even in the fight against terrorism. A client-side scanning mandate would invite the same fate.

But ‘if you build it, they will come’. If device vendors are compelled to install remote surveillance, the demands will start to roll in. Who could possibly be so cold-hearted as to argue against the system being extended to search for missing children? Then President Xi will want to know who has photos of the Dalai Lama, or of men standing in front of tanks; and copyright lawyers will get court orders blocking whatever they claim infringes their clients’ rights. Our phones, which have grown into extensions of our intimate private space, will be ours no more; they will be private no more; and we will all be less secure.

EPSRC and InnovateUK launch £8M Digital Security by Design – CHERI/Morello Software Ecosystem funding call

For a bit over a decade, SRI International and the University of Cambridge have been working to develop CHERI (Capability Hardware Enhanced RISC Instructions), a set of processor-architecture security extensions targeting vulnerability mitigation through memory safety and software compartmentalisation. In 2019, the UK’s Industrial Strategy Challenge Fund announced the £187M Digital Security by Design (DSbD) programme, which is supporting the creation Arm’s experimental CHERI-based Morello processor, System-on-Chip (SoC), and board shipping in early 2022, as well as dozens of industrial and academic projects to explore and develop CHERI-based software security. This week, UKRI will be launching an £8M funding call via EPSRC and InnovateUK to support UK-based academic and industrial CHERI/Morello software ecosystem development work. They are particularly interested in supporting work in the areas of OS and developer toolchain, libraries and packages, language runtimes, frameworks and middleware, and platform services on open-source operating systems — all key areas to expand the breadth and maturity of CHERI-enabled software. There is a virtual briefing event taking place on 5 October 2021, with proposals due on 8 December 2021.

CHERI Software Release for Summer 2021

The CHERI project at SRI International and the University of Cambridge are pleased to announce our second CHERI reference software-stack release. The release supports the CHERI-RISC-V and Arm Morello architectures. A complete set of development tools, including compiler, OS, debugger, and emulator are included in the release.

For this second release, the software has been packaged to be used in an easily accessible Docker image, hosted on Docker Hub. The focus of the release has been on CheriBSD, with the merging of pure capability kernels and Morello support to the development mainline. There are also many continued improvements to tooling, including updates to cheribuild (our unified build system), the CHERI Clang/LLVM compiler suite, the CHERI-extended GNU Debugger (gdb), and the QEMU full-system emulator (which now supports both the CHERI-RISC-V and Morello architectures).

The CHERI protection model provides architectural primitives to protect computer systems from widely-exploited security vulnerabilities. CHERI revises the hardware/software architectural interface with hardware support for capabilities that can be used for fine-grained memory protection and scalable software compartmentalization. Supported by DARPA (the US
Defense Advanced Research Projects agency) as well as UKRI (UK Research and Innovation) and its Digital Security by Design (DSbD) program, CHERI is the work of a large research team at the University of Cambridge, SRI International, Arm and many industrial and academic collaborators throughout the world.

The release along with full release notes are now available on line at https://cheri-dist.cl.cam.ac.uk/

Is Apple’s NeuralMatch searching for abuse, or for people?

Apple stunned the tech industry on Thursday by announcing that the next version of iOS and macOS will contain a neural network to scan photos for sex abuse. Each photo will get an encrypted ‘safety voucher’ saying whether or not it’s suspect, and if more than about ten suspect photos are backed up to iCloud, then a clever cryptographic scheme will unlock the keys used to encrypt them. Apple staff or contractors can then look at the suspect photos and report them.

We’re told that the neural network was trained on 200,000 images of child sex abuse provided by the US National Center for Missing and Exploited Children. Neural networks are good at spotting images “similar” to those in their training set, and people unfamiliar with machine learning may assume that Apple’s network will recognise criminal acts. The police might even be happy if it recognises a sofa on which a number of acts took place. (You might be less happy, if you own a similar sofa.) Then again, it might learn to recognise naked children, and flag up a snap of your three-year-old child on the beach. So what the new software in your iPhone actually recognises is really important.

Now the neural network described in Apple’s documentation appears very similar to the networks used in face recognition (hat tip to Nicko van Someren for spotting this). So it seems a fair bet that the new software will recognise people whose faces appear in the abuse dataset on which it was trained.

So what will happen when someone’s iPhone flags ten pictures as suspect, and the Apple contractor who looks at them sees an adult with their clothes on? There’s a real chance that they’re either a criminal or a witness, so they’ll have to be reported to the police. In the case of a survivor who was victimised ten or twenty years ago, and whose pictures still circulate in the underground, this could mean traumatic secondary victimisation. It might even be their twin sibling, or a genuine false positive in the form of someone who just looks very much like them. What processes will Apple use to manage this? Not all US police forces are known for their sensitivity, particularly towards minority suspects.

But that’s just the beginning. Apple’s algorithm, NeuralMatch, stores a fingerprint of each image in its training set as a short string called a NeuralHash, so new pictures can easily be added to the list. Once the tech is built into your iPhone, your MacBook and your Apple Watch, and can scan billions of photos a day, there will be pressure to use it for other purposes. The other part of NCMEC’s mission is missing children. Can Apple resist demands to help find runaways? Could Tim Cook possibly be so cold-hearted as to refuse at add Madeleine McCann to the watch list?

After that, your guess is as good as mine. Depending on where you are, you might find your photos scanned for dissidents, religious leaders or the FBI’s most wanted. It also reminds me of the Rasterfahndung in 1970s Germany – the dragnet search of all digital data in the country for clues to the Baader-Meinhof gang. Only now it can be done at scale, and not just for the most serious crimes either.

Finally, there’s adversarial machine learning. Neural networks are fairly easy to fool in that an adversary can tweak images so they’re misclassified. Expect to see pictures of cats (and of Tim Cook) that get flagged as abuse, and gangs finding ways to get real abuse past the system. Apple’s new tech may end up being a distributed person-search machine, rather than a sex-abuse prevention machine.

Such a technology requires public scrutiny, and as the possession of child sex abuse images is a strict-liability offence, academics cannot work with them. While the crooks will dig out NeuralMatch from their devices and play with it, we cannot. It is possible in theory for Apple to get NeuralMatch to ignore faces; for example, it could blur all the faces in the training data, as Google does for photos in Street View. But they haven’t claimed they did that, and if they did, how could we check? Apple should therefore publish full details of NeuralMatch plus a set of NeuralHash values trained on a public dataset with which we can legally work. It also needs to explain how the system it deploys was tuned and tested; and how dragnet searches of people’s photo libraries will be restricted to those conducted by court order so that they are proportionate, necessary and in accordance with the law. If that cannot be done, the technology must be abandoned.

WEIS 2021 – Liveblog

I’ll be trying to liveblog the twentieth Workshop on the Economics of Information Security (WEIS), which is being held online today and tomorrow (June 28/29). The event was introduced by the co-chairs Dann Arce and Tyler Moore. 38 papers were submitted, and 15 accepted. My summaries of the sessions of accepted papers will appear as followups to this post; there will also be a panel session on the 29th, followed by a rump session for late-breaking results. (Added later: videos of the sessions are linked from the start of the followups that describe them.)

Cybercrime gangs as tech startups

In our latest paper, we propose a better way of analysing cybercrime.

Crime has been moving online, like everything else, for the past 25 years, and for the past decade or so it’s accounted for more than half of all property crimes in developed countries. Criminologists have tried to apply their traditional tools and methods to measure and understand it, yet even when these research teams include technologists, it always seems that there’s something missing. The people who phish your bank credentials are just not the same people who used to burgle your house. They have different backgrounds, different skills and different organisation.

We believe a missing factor is entrepreneurship. Cyber-crooks are running tech startups, and face the same problems as other tech entrepreneurs. There are preconditions that create the opportunity. There are barriers to entry to be overcome. There are pathways to scaling up, and bottlenecks that inhibit scaling. There are competitive factors, whether competing crooks or motivated defenders. And finally there may be saturation mechanisms that inhibit growth.

One difference with regular entrepreneurship is the lack of finance: a malware gang can’t raise VC to develop a cool new idea, or cash out by means on an IPO. They have to use their profits not just to pay themselves, but also to invest in new products and services. In effect, cybercrooks are trying to run a tech startup with the financial infrastructure of an ice-cream stall.

We have developed this framework from years of experience dealing with many types of cybercrime, and it appears to prove a useful way of analysing new scams, so we can spot those developments which, like ransomware, are capable of growing into a real problem.

Our paper Silicon Den: Cybercrime is Entrepreneurship will appear at WEIS on Monday.

Security engineering and machine learning

Last week I gave my first lecture in Edinburgh since becoming a professor there in February. It was also the first talk I’ve given in person to a live audience since February 2020.

My topic was the interaction between security engineering and machine learning. Many of the things that go wrong with machine-learning systems were already familiar in principle, as we’ve been using Bayesian techniques in spam filters and fraud engines for almost twenty years. Indeed, I warned about the risks of not being able to explain and justify the decisions of neural networks in the second edition of my book, back in 2008.

However the deep neural network (DNN) revolution since 2012 has drawn in hundreds of thousands of engineers, most of them without this background. Many fielded systems are extremely easy to break, often using tricks that have been around for years. What’s more, new attacks specific to DNNs – adversarial samples – have been found to exist for pretty well all models. They’re easy to find, and often transferable from one model to another.

I describe a number of new attacks and defences that we’ve discovered in the past three years, including the Taboo Trap, sponge attacks, data ordering attacks and markpainting. I argue that we will usually have to think of defences at the system level, rather than at the level of individual components; and that situational awareness is likely to play an important role.

Here now is the video of my talk.