The Dynamics of Industry-wide Disclosure

Last year, we disclosed two related vulnerabilities that broke a wide range of systems. In our Bad Characters paper, we showed how to use Unicode tricks – such as homoglyphs and bidi characters – to mislead NLP systems. Our Trojan Source paper showed how similar tricks could be used to make source code look one way to a human reviewer, and another way to a compiler, opening up a wide range of supply-chain attacks on critical software. Prior to publication, we disclosed our findings to four suppliers of large NLP systems, and nineteen suppliers of software development tools. So how did industry respond?

We were invited to give the keynote talk this year at LangSec, and the video is now available. In it we describe not just the Bad Characters and Trojan Source vulnerabilities, but the large natural experiment created by their disclosure. The Trojan Source vulnerability affected most compilers, interpreters, code editors and code repositories; this enabled us to compare responses by firms versus nonprofits and by firms that managed their own response versus those who outsourced it. The interaction between bug bounty programs, government disclosure assistance, peer review and press coverage was interesting. Most of the affected development teams took action, though some required a bit of prodding.

The response by the NLP maintainers was much less enthusiastic. By the time we gave this talk, only Google had done anything – though we now hear that Microsoft is now also working on a fix. The reasons for this responsibility gap need to be understood better. They may include differences in culture between C coders and data scientists; the greater costs and delays in the build-test-deploy cycle for large ML models; and the relative lack of press interest in attacks on ML systems. If many of our critical systems start to include ML components that are less maintainable, will the ML end up being the weakest link?

Morello chip on board

Formal CHERI: rigorous engineering and design-time proof of full-scale architecture security properties

Memory safety bugs continue to be a major source of security vulnerabilities, with their root causes ingrained in the industry:

  • the C and C++ systems programming languages that do not enforce memory protection, and the huge legacy codebase written in them that we depend on;
  • the legacy design choices of hardware that provides only coarse-grain protection mechanisms, based on virtual memory; and
  • test-and-debug development methods, in which only a tiny fraction of all possible execution paths can be checked, leaving ample unexplored corners for exploitable bugs.

Over the last twelve years, the CHERI project has been working on addressing the first two of these problems by extending conventional hardware Instruction-Set Architectures (ISAs) with new architectural features to enable fine-grained memory protection and highly scalable software compartmentalisation, prototyped first as CHERI-MIPS and CHERI-RISC-V architecture designs and FPGA implementations, with an extensive software stack ported to run above them.

The academic experimental results are very promising, but achieving widespread adoption of CHERI needs an industry-scale evaluation of a high-performance silicon processor implementation and software stack. To that end, Arm have developed Morello, a CHERI-enabled prototype architecture (extending Armv8.2-A), processor (adapting the high-performance Neoverse N1 design), system-on-chip (SoC), and development board, within the UKRI Digital Security by Design (DSbD) Programme (see our earlier blog post on Morello). Morello is now being evaluated in a range of academic and industry projects.

Morello desktopMorello chip on board

However, how do we ensure that such a new architecture actually provides the security guarantees it aims to provide? This is crucial: any security flaw in the architecture will be present in any conforming hardware implementation, quite likely impossible to fix or work around after deployment.

In this blog post, we describe how we used rigorous engineering methods to provide high assurance of key security properties of CHERI architectures, with machine-checked mathematical proof, as well as to complement and support traditional design and development workflows, e.g. by automatically generating test suites. This is addressing the third problem, showing that, by judicious use of rigorous semantics at design time, we can do much better than test-and-debug development.
Continue reading Formal CHERI: rigorous engineering and design-time proof of full-scale architecture security properties

Text mining is harder than you think

Following last year’s row about Apple’s proposal to scan all the photos on your iPhone camera roll, EU Commissioner Johansson proposed a child sex abuse regulation that would compel providers of end-to-end encrypted messaging services to scan all messages in the client, and not just for historical abuse images but for new abuse images and for text messages containing evidence of grooming.

Now that journalists are distracted by the imminent downfall of our great leader, the Home Office seems to think this is a good time to propose some amendments to the Online Safety Bill that will have a similar effect. And while the EU planned to win the argument against the pedophiles first and then expand the scope to terrorist radicalisation and recruitment too, Priti Patel goes for the terrorists from day one. There’s some press coverage in the Guardian and the BBC.

We explained last year why client-side scanning is a bad idea. However, the shift of focus from historical abuse images to text scanning makes the government story even less plausible.

Detecting online wickedness from text messages alone is hard. Since 2016, we have collected over 99m messages from cybercrime forums and over 49m from extremist forums, and these corpora are used by 179 licensees in 55 groups from 42 universities in 18 countries worldwide. Detecting hate speech is a good proxy for terrorist radicalisation. In 2018, we thought we could detect hate speech with a precision of typically 92%, which would mean a false-alarm rate of 8%. But the more complex models of 2022, based on Google’s BERT, when tested on the better collections we have now, don’t do significantly better; indeed, now that we understand the problem in more detail, they often do worse. Do read that paper if you want to understand why hate-speech detection is an interesting scientific problem. With some specific kinds of hate speech it’s even harder; an example is anti-semitism, thanks to the large number of synonyms for Jewish people. So if we were to scan 10bn messages a day in Europe there would be maybe a billion false alarms for Europol to look at.

We’ve been scanning the Internet for wickedness for over fifteen years now, and looking at various kinds of filters for everything from spam to malware. Filtering requires very low false positive rates to be feasible at Internet scale, which means either looking for very specific things (such as indicators of compromise by a specific piece of malware) or by having rich metadata (such as a big spam run from some IP address space you know to be compromised). Whatever filtering Facebook can do on Messenger given its rich social context, there will be much less that a WhatsApp client can do by scanning each text on its way through.

So if you really wish to believe that either the EU’s CSA Regulation or the UK’s Online Harms Bill is an honest attempt to protect kids or catch terrorists, good luck.

Screenshot of the archived version of the Action Fraud website linked to from the NCA contact us page.

Reporting cybercrime is hard: NCA link to Action Fraud broken for 3 years

Screenshot of the archived version of the Action Fraud website linked to from the NCA contact us page.
Archived version of Action Fraud website

Yesterday I was asked for advice on anonymously reporting a new crypto scam that a potential victim had spotted before they lost money (hint: to a first approximation all cryptocurrencies and cryptoassets are a scam). In the end they got fed up with the difficulty of finding someone they could tell and gave up. However, to give the advice I thought I would check what the National Crime Agency’s National Cyber Crime Unit suggested so I searched “NCA NCCU report scam” and the first result was for the NCA’s Contact us page. Sounds good. It has a “Fraud” section which (as expected) talks about Action Fraud. However, since 2019 this page has linked to the National Archives archive of an old version of the Action Fraud website. So for three years if you followed the NCA’s website’s advice on how to report fraud you would have got very confused until you worked out you were on a (clearly labelled) archive rather than the proper website, which is why none of the forms work.

I reported this problem yesterday and I do not expect it to have been fixed by the time of writing but this problem going unresolved for three years is a clear example of the difficulties faced by victims of cybercrime.

2019 is also the year that Police Scotland declined to pay for Action Fraud as they did not consider it to provide value for money and instead handle fraud reporting internally.

I am PI of a jointly supervised between the University of Strathclyde and the University of Edinburgh PhD project funded by the Scottish Institute for Policing Research and the University of Strathclyde on Improving Cybercrime Reporting. Do get in touch with other stories of the difficulties of reporting cybercrime. The student, Juraj Sikra has published a systematic literature review on Improving Cybercrime Reporting in Scotland. It is clear that there is a long way to go to provide person centred cybercrime reporting for victims and potential victims. However, UK law enforcement in general, and Police Scotland in particular know there is a problem and do want to fix it.

European Commission prefers breaking privacy to protecting kids

Today, May 11, EU Commissioner Ylva Johannson announced a new law to combat online child sex abuse. This has an overt purpose, and a covert purpose.

The overt purpose is to pressure tech companies to take down illegal material, and material that might possibly be illegal, more quickly. A new agency is to be set up in the Hague, modeled on and linked to Europol, to maintain an official database of illegal child sex-abuse images. National authorities will report abuse to this new agency, which will then require hosting providers and others to take suspect material down. The new law goes into great detail about the design of the takedown process, the forms to be used, and the redress that content providers will have if innocuous material is taken down by mistake. There are similar provisions for blocking URLs; censorship orders can be issued to ISPs in Member States.

The first problem is that this approach does not work. In our 2016 paper, Taking Down Websites to Prevent Crime, we analysed the takedown industry and found that private firms are much better at taking down websites than the police. We found that the specialist contractors who take down phishing websites for banks would typically take six hours to remove an offending website, while the Internet Watch Foundation – which has a legal monopoly on taking down child-abuse material in the UK – would often take six weeks.

We have a reasonably good understanding of why this is the case. Taking down websites means interacting with a great variety of registrars and hosting companies worldwide, and they have different ways of working. One firm expects an encrypted email; another wants you to open a ticket; yet another needs you to phone their call centre during Peking business hours and speak Mandarin. The specialist contractors have figured all this out, and have got good at it. However, police forces want to use their own forms, and expect everyone to follow police procedure. Once you’re outside your jurisdiction, this doesn’t work. Police forces also focus on process more than outcome; they have difficulty hiring and retaining staff to do detailed technical clerical work; and they’re not much good at dealing with foreigners.

Our takedown work was funded by the Home Office, and we recommended that they run a randomised controlled trial where they order a subset of UK police forces to use specialist contractors to take down criminal websites. We’re still waiting, six years later. And there’s nothing in UK law that would stop them running such a trial, or that would stop a Chief Constable outsourcing the work.

So it’s really stupid for the European Commission to mandate centralised takedown by a police agency for the whole of Europe. This will be make everything really hard to fix once they find out that it doesn’t work, and it becomes obvious that child abuse websites stay up longer, causing real harm.

Oh, and the covert purpose? That is to enable the new agency to undermine end-to-end encryption by mandating client-side scanning. This is not evident on the face of the bill but is evident in the impact assessment, which praises Apple’s 2021 proposal. Colleagues and I already wrote about that in detail, so I will not repeat the arguments here. I will merely note that Europol coordinates the exploitation of communications systems by law enforcement agencies, and the Dutch National High-Tech Crime Unit has developed world-class skills at exploiting mobile phones and chat services. The most recent case of continent-wide bulk interception was EncroChat; although reporting restrictions prevent me telling the story of that, there have been multiple similar cases in recent years.

So there we have it: an attack on cryptography, designed to circumvent EU laws against bulk surveillance by using a populist appeal to child protection, appears likely to harm children instead.

Hiring for iCrime

A Research Assistant/Associate position is available at the Department of Computer Science and Technology to work on the ERC-funded Interdisciplinary Cybercrime Project (iCrime). We are looking to appoint a computer scientist to join an interdisciplinary team reporting to Dr Alice Hutchings.

iCrime incorporates expertise from criminology and computer science to research cybercrime offenders, their crime type, the place (such as online black markets), and the response. Within iCrime, we sustain robust data collection infrastructure to gather unique, high quality datasets, and design novel methodologies to identify and measure criminal infrastructure at scale. This is particularly important as cybercrime changes dynamically. Overall, our approach is evaluative, critical, and data driven.

Successful applicants will work in a team to collect and analyse data, develop tools, and write research outputs. Desirable technical skills include:

– Familiarity with automated data collection (web crawling and scraping) and techniques to sustain the complex data collection in adversarial environments at scale.
– Excellent software engineering skills, being familiar with Python, Bash scripting, and web development, particularly NodeJS and ReactJS.
– Experience in DevOps to integrate and migrate new tools within the existing ecosystem, and to automate data collection/transmission/backup pipelines.
– Working knowledge of Linux/Unix.
– Familiarity with large-scale databases, including relational databases and ElasticSearch.
– Practical knowledge of security and privacy to keep existing systems secure and protect against data leakage.
– Expertise in cybercrime research and data science/analysis is desirable, but not essential.

Please read the formal advertisement (at https://www.jobs.cam.ac.uk/job/34324/) for the details about exactly who and what we’re looking for and how to apply — and please pay special attention to our request for a covering letter!

A striking memoir by Gus Simmons

Gus Simmons is one of the pioneers of cryptography and computer security. His contributions to public-key cryptography, unconditional authentication, covert channels and information hiding earned him an honorary degree, fellowship of the IACR, and election to the Rothschild chair of mathematics when he visited us in Cambridge in 1996. And this was his hobby; his day job was a mathematician at Sandia National Laboratories, where he worked on satellite imagery, arms-control treaty verification, and the command and control of nuclear weapons.

During lockdown, Gus wrote a book of stories about growing up in West Virginia during the depression years of the 1930s. After he circulated it privately to a few friends in the cryptographic community, we persuaded him to put it online so everyone can read it. During this desolate time, coal mines closed and fired their workers, who took over abandoned farms and survived as best they could. Gus’s memoir is a gripping oral history of a period when some parts of the U.S.A. were just as poor as rural Africa today.

Here it is: Another Time, Another Place, Another Story.

Security course at Cambridge

I have taken over the second-year Security course at Cambridge, which is traditionally taught in Easter term. From the end of April onwards I will be teaching three lectures per week. Taking advantage of the fact that Cambridge academics own the copyright and performance rights on their lectures, I am making all my undergraduate lectures available at no charge on my YouTube channel frankstajanoexplains.com. My lecture courses on Algorithms and on Discrete Mathematics are already up and I’ll be uploading videos of the Security lectures as I produce them, ahead of the official lecturing dates. I have uploaded the opening lecture this morning. You are welcome to join the class virtually and you will receive exactly the same tuition as my Cambridge students, at no charge. 


The philosophy of the course is to lead students to learn the fundamentals of security by “studying the classics” and gaining practical hands-on security experience by recreating and replicating actual attacks. (Of course the full benefits of the course are only reaped by those who do the exercises, as opposed to just watching the videos.)


This is my small contribution to raising a new generation of cyber-defenders, alongside the parallel thread of letting young bright minds realise that security is challenging and exciting by organising CTFs (Capture-The-Flag competitions) for them to take part in, which I have been doing since 2015 and continue to do. On that note, any students (undergraduate, master or PhD) currently studying in a university in UK, Israel, USA, Japan, Australia and France still have a couple more days to sign up for our 2022 Country to Country CTF, a follow-up to the Cambridge to Cambridge CTF that I co-founded with Howie Shrobe and Lori Glover at MIT in 2015. The teams will mix people at different levels so no prior experience is required. Go for it!