Category Archives: Academic papers

Cambridge Cybercrime Conference 2025 – Liveblog

The Cambridge Cybercrime Centre‘s eight one day conference on cybercrime was held on Monday, 23rd June 2025, which marked 10 years of the Centre.

Similar to previous “liveblog” coverage of conferences and workshops on Light Blue Touchpaper, here is a “liveblog”-style overview of the talks at this year’s conference.

Sunoo Park — Legal Risks of Security Research

Sunoo discussed researchers receiving restrictive TOS clauses, and risk around adversarial scrutiny. Noting that it’s difficult to distinguish from malicious hacking, and we need to understand the risks. Sunoo highlights particular US laws that creates risk for researchers, sharing a guide they wrote for highlighting these risks. This project grew from colleagues receiving legal threats, as well as clients, wanting to enable informed decisions on how to seek advice, and also try to nudge public discussion on law reforms.

The CFAA was passed a long time ago, around the time of the Wargames film. Computer crime has changed a lot since then. They define computer to be pretty much any computer, where access is unauthorized or exceeds authorized access. One early case was United States vs McDanel, who found a bug in customer software and reported this to customers. This resulted in a legal case where customers were informed of a security flaw, due to the cost of fixing the flaw, but the government later requested the case be overturned. More recently, there was a case of a police database being accessed for a bribe, which was also under the CFAA.

Another law is the DMCA, which states that “no person shall circumvent a technological measure that effectively controls access to work”, and this may apply to captchas, anti-bot, etc.

Sunoo is starting a new study looking at researchers’ lived experiences of legal risk under US/UK law. It can be hard for researchers to talk openly about these, which results in little evidence to counter laws. Furthermore, there’s a lot of anecdotal information. Sunoo would like to hear from US/UK researchers relating to law and researchers.

Alice Hutchings — Ten years of the Cambridge Cybercrime Centre

The Centre was established in 2015, to collect and share cybercrime data internationally. They collect lots of data at scale: forums, chat channels, extremist platforms, DDoS attacks, modded apps, defacements, spam, and more. They share datasets with academics, not for commercial purposes, through agreements to set out ethical and legal constraints. The aim was to help researchers with collecting data at scale, and overcome challenges with working on large datasets. They don’t just collect data, but they do their own research too, around crime types, offenders, places, and responses.

Session 1: Trust, Identity, and Communication in Cybercriminal Ecosystems

Roy Ricaldi— From trust to trade: Uncovering the trust-building mechanisms supporting cybercrime markets on Telegram

Roy is researching trust and cybercrime, and how this is built on Telegram. Cybercrime markets rely on trust to function, and there is existing literature on this topic for forums. Forums have structured systems, such as reputation and escrow, whereas Telegram is more ephemeral, but still used for trading. Roy asks how trust established in this volatile, high-risk environment? Economic theory states without trust, markets can fail.

Roy starts by exploring the market segments found, looking at trust signals, and how frequently users are exposed to these trust systems. Roy notes chat channels can have significant history, and while trust signals exists, users may not be likely to find older trust signals easily. They built a snowballing and classification pipeline, to collect over 1 million messages from 167 telegram communities. Later, they developed a framework, for measuring and simulating trust signals. Findings showed market segments were highly thematic within communities, and trust signals. They used DeepseekV3 for classification, which detected trust signals and market segments with highest accuracy. They found an uneven distribution of trust signals across market segments. For example, piracy content is free so trust signals were low.

They find messages asking for use of escrow, or asking other to “vouch” for sellers. Some of these communities have moderators which would set rules around types of messages. After looking at the distribution, they ran a simulation to see how many signals the users were exposed to. Setup profiles of market segments, communities visited and messages read. They found 70% of users see 5 or less trust signals in their simulation, and all users see at least 1. Over time, these do evolve with digital infrastructure forming a larger peak. They note the importance of understanding how trust works on Telegram, to help find the markets that matter and can cause harm.

John McAlaneyPower, identity and group dynamics in hacking forums

John discussed work in progress around power structures and group dynamics in the CrimeBB dataset. He attended Defcon as a social psychologist, observing the interaction dynamics and how people see themselves within the large size of the conference.

Previous work in identity asked if hacking forums members considered themselves to be a “hacker” and resulted in discussions around the term and labelling. Other previous work looked at themes of what was spoken about in forums, such as legality, honesty, skill acquisition, knowledge, and risk. Through interviews, they found people had contradictory ideas around trust. They note existing hierarchies of power within forums, and evidence of social psychological phenomenon.

Within existing research literature, John found a gap where theories had not been explored necessarily in the online forum setting. They ask if there are groups forming on hacking forums in the same way as other online forums? Also, how does the structure of these groups differ? Are group dynamics different?

He was initially working with a deductive approach for thematic analysis. “Themes do not emerge from thematic analysis”, rather they are exploring what is currently discussed. He is not looking to generalise from thematic analysis, but rather looking into BERT next to see if they are missing any themes from the dataset.

He suggests the main impact will aim to contribute back to sociological literature, and also try to improve threat detection.

Haitao ShiEvaluating the impact of anonymity on emotional expression in drug-related discussions: a comparative study of the dark web and mainstream social media

Haitao looked at self-disclosure, emotional disclosure, and environmental influence on cybercrime forums. They ask how different models of anonymity across chat channels and forums vary, and which different communications styles emerge? They identified drug-related channels and discussions for their analysis, and took steps to clean and check dataset quality. The project used BERTopic, for embedding messages to be used in clustering, then plotted these to visually identify similar topics. To further explore the topics, Haitao used an emotion classifier to detect intent. They found high levels of disgust, anger, and anticipation in their dataset.

Session 2: Technical Threats and Exploitation Tactics

Taro TsuchiyaBlockchain address poisoning

Taro introduces a scenario of sending rent, where the victim seems to make an error selecting a cryptocurrency address. This turns out to have been a poisoned address. Taro aims to identify address poisoning, to see how prevalent this is, and measure the payoff. They identify attack attempts with an algorithm to match transfers with similar addresses in a given time range.

They detect 270M attack transfers on 17M victims, estimating a $84M USD loss. They found loss was much higher on Ethereum, and this lookalike attack is easily generalisable and scalable.

They bundled these into groups, considering two are the same if, they are launched in the same transaction, and they use the same address to pay the transaction fees, or they use the same lookalike address. Clustering found “copying bots”, who copy other transactions for front-running. The attack groups identified are large but heterogenous, and the attack itself is profitable for large groups. Furthermore, larger groups tend to win over smaller groups. Finally, they model lookalike address generation, finding one large group is using GPUs to generate these addresses.

They give suggestions for mitigating these attacks, by adding latency for address generation, disallow zero-value transfers, and increase wallet lengths. They also want to alert users to this risk of this attack.

Marre SlikkerThe human attack surface: understanding hacker techniques in exploiting human elements

Marre is looking at human factors in security, as this is commonly the weakest link in security. Marre asks what do hackers on underground forums discuss regarding the exploitation of human factors in cybercrime? They look at CrimeBB data to analyse topics discussed, identify lexicon used, and give a literature review of how these factors are conceptualised.

They create a bridge between academic human factor language (“demographics”) to hacker language (“target dumb boomers”), and use topic modelling to identify distribution of words used in forum messages.

What were their results? A literature review found a lot of inconsistencies in human factors research terminology. Following this, they asked cybersecurity experts about human factors, and created a list of 328 keywords to help filter the dataset. Topic modelling was then used, however the results were quite superficial, with lots of noise and general chatter.

Kieron Ivy Turk — Technical Tactics Targeting Tech-Abuse

Ivy discussed a project on personal item tracking devices, which have been misused for stalking, domestic abuse, and theft. Companies have developed anti-stalking features to try to mitigate these issues. They ran a study with the Assassins Guild, provided students with trackers to test the efficacy of these features. Their study found nobody used the anti-stalking features, despite everyone in the study knowing there was a possibility they were being stalked. At the time of the study, the scanning apps only tended to detect a subset of tracker brands. Apple and Google have since created an RFC to try to standardise trackers and anti-stalking measures.

Ivy has also been working on IoT security to understand the associated risks. They present a HARMS model to help analyse IoT device security failings. Ivy ran a study to identify harms with IoT devices, asking participants to misuse these. They ask how do attackers discover abusive features? They found participants used and explored the UI to find features available to them. They suggest the idea of a “UI-bounded” adversary is limiting, and rather attackers are “functionality-enabled”.

Ivy asks how can we create technical improvements in future with IoT?

Session 3: Disruption and Resilience in Illicit Online Activities

Anh V. VuAssessing the aftermath: the effects of a global takedown against DDoS-for-hire services

Anh has been following DDoS takedowns by law enforcement. DDoS for hire services provide a platform for taking control of botnets to be used in flooding servers with fake traffic. There is little technical skill needed, and is cheap. These services publicly advertise statistics of daily attacks they contribute to.

Law enforcement continues to takedown DDoS infrastructure, focusing on domain takedowns. Statistics of visitors following the takedowns found 20M visitors, and 34k messages were collected from DDoS support Telegram channels. They also have DDoS UDP amplification data, and collected self-reported DDoS attack data.

Domain takedowns showed that domains returned quickly, 52% returned after the first takedown, and in the second takedown all returned. Domain takedown appears to now have limited effect. Visitor statistics showed large booters operate a franchise business, offering API access to resellers.

Following the first takedown, activity and chat channel messages declined, but this had less impact in the second wave. Operators gave away free extensions to plans, and a few seemed to leave the market.

Their main takeaway is the overall intervention impact is short lived, and suppressing the supply side alone is not enough as the demand continues to persist in the long run. He asks what can be done better for interventions in the future?

Dalya ManatovaModeling organizational resilience: a network-based simulation for analyzing recovery and disruption of ransomware operations

Dalya studies the organisational dynamics and resilience of cybercrime, tracking the evolution and rebranding of ransomware operators. To carry out ransomware, they need infrastructure. This includes selecting targets, executing, ransom negotiation, payment processing, and victim support, and creating leak websites. They break this down further into a complex model, showing the steps of ransomware attacks. They use this to model the task duration involved in attacks, estimating how long it takes to complete a ransomware attack when learning. Following this, they create infrastructure disruption and observe how this process changes. They also model the disruption of members: what happens if they reassign tasks to others or hire a new person?

Marco WähnerThe prevalence and use of conspiracy theories in anonymity networks

Marco first asks what is a conspiracy theory? These all appear to have right-wing extremism, antisemitism, and misinformation. There are a lot of challenges around researching conspiracy theories: the language is often indirect and coded, however this is not a new phenomenon.

What is the influence of environmental and structural of conspiracy theories in anonymised networks? Marco notes this can be for strengthening social ties, and fosters a sense of belonging. Also, this may be used with ideological or social incentives.

Marco asks how we can identify these theories circulating in anonymised networks, and if these are used to promote illicit activities or drive sales? This could then be used to formulate intervention strategies. They took a data-driven approach looking at CrimeBB and ExtremeBB data to find conspiracies, using dictionary keyword searches and topic modelling. Preliminary research found prevalence of conspiracies was very low. ExtremeBB is a bit higher, but still rare.

They provide explanations for the low level of distribution. Keywords are indirect, and can be out of context when searching. Also, conspiratorial communications are not always needed to sell products. They are aiming to strengthen the study design, by coding a subsample to check for false positives, and use classical ML models. They find a dictionary approach may not be a good starting point, and conspiracies are not always used to sell products.

Human HARMS: Threat modelling social harms against technical systems

by Kieron Ivy Turk, Anna Talas, and Alice Hutchings

When talking about the importance of cybersecurity, we often imagine hackers breaking into high-security systems to steal data, money or launch large-scale attacks. However, technology can also be used for harm in everyday situations. Traditional cybersecurity models tend to focus on protecting systems from highly skilled external threats. While these models are effective in cybersecurity, they do not adequately address interpersonal threats that often do not require technical skills, such as those found in cases of domestic abuse.

The HARMS model (Harassment, Access and infiltration, Restrictions, Manipulation and Tampering, and Surveillance) is a new threat modelling framework. It is designed to identify non-technical and human factors harms that are often missed by popular frameworks such as STRIDE. We focused on how everyday technology, such as IoT devices, can be exploited to distress, control or intimidate others. 

The five elements of this model are Harassment, Access and infiltration, Restrictions, Manipulation and tampering, and Surveillance. Definitions and examples of these terms are provided in Table 1.

The threat model can be used to consider how a device or application can be used maliciously to identify ways it can be re-designed to make it more difficult to commit these harms. Imagine, for example, a smart speaker in a shared home. This could be used maliciously by an abusive individual to send distressing messages to be read aloud or set alarms to go off in the middle of the night. Equally, if the smart speaker is connected to calendars, scheduled events could be changed or removed, so users miss meetings and appointments. Furthermore, connected devices can be controlled remotely or automatically through routines, causing changes that the user does not understand and making them doubt their memory or even their sanity. They could also monitor conversations through built-in microphones or keep track of the commands others have used on the device through logs.

Importantly, any one type of harm is not constrained to one of these categories – in fact, many possible attacks will span multiple components of HARMS. For example, a common yet severe online harm is doxxing, wherein a malicious user obtains sensitive information about a user and shares it online. This encompases many aspects of the HARMS models as information may be obtained through surveillance, but be released with the intention of harassing other users. Any threat analysis utilising HARMS must therefore consider possible overlaps between elements to identify a broader set of attacks.

The human HARMS model approaches threat modelling from a unique angle compared to widespread methodologies such as STRIDE. There exist various overlaps between methods, which can be used to obtain a greater perspective of possible attack types. The Surveillance component of HARMS concerns privacy, as does Information disclosure in STRIDE. However, surveillance covers malicious observation and monitoring of people, whilst information disclosure focuses on data storage and leaks. Other risks can only be identified through one model, such as Harassment (HARMS) and Repudiation (STRIDE). We recommend using multiple threat modelling methodologies to encourage improved analysis of security, privacy, and possible misuse of novel systems.

As smart home technology, connected devices, and online platforms continue to evolve, we must think beyond just technical security. Our HARMS model highlights how technology, even when working as intended, can be used to control and harm individuals. By also incorporating human-centered threat modelling into designing software development, in addition to traditional threat modelling methods, we can build safer systems that help prevent them being used for abuse.

Paper: Turk, K. I., Talas, A., & Hutchings, A. (2025). Threat Me Right: A Human HARMS Threat Model for Technical Systems. arXiv preprint arXiv:2502.07116.

It is time to standardize principles and practices for software memory safety

In an article in the February, 2025 issue of Communications of the ACM, I join 20 coauthors from across academia and industry in writing about the remarkable opportunity for universal strong memory safety in low-level Trusted Computing Bases (TCBs) enabled by recent advances in type- and memory-safe systems programming languages (e.g., the Rust language), hardware memory protection (e.g., our work on CHERI), formal methods, and software compartmentalisation. These technologies are seeing increasing early deployment in critical software TCBs, but struggle to make headway at scale given real costs and potential disruption stemming from their adoption combined with unclear market demand despite widespread recognition of the criticality of this issue. As a result, billions of lines of memory-unsafe C/C++ systems code continue to make up essential TCBs across the industry – including Windows, Linux, Android, iOS, Chromium, OpenJDK, FreeRTOS, vxWorks, and others. We argue that a set of economic factors such as high opportunity costs, negative security impact as an externality, and two-sided incomplete information regarding memory safety lead to limited and slow adoption despite the huge potential security benefit: It is widely believed that these techniques would have deterministically eliminated an estimated 70% of critical security vulnerabilities in these and other C/C++ TCBs over the last decade.

In our article, we describe how developing standards for memory-safe systems may be able to help enable remedies by making potential benefit more clear (and hence facilitating clear signalling of demand) as well as permitting interventions such as:

  • Improving actual industrial practice
  • Enabling acquisition requirements that incorporate memory-safety expectations
  • Enabling subsidies or tax incentives
  • Informing international discussions around software liability
  • Informing policy interventions for specific, critical classes of products/use cases
Continue reading It is time to standardize principles and practices for software memory safety

Owl, a new augmented password-authenticated key exchange protocol

In 2008, I wrote a blog to introduce J-PAKE, a password-authenticated key exchange (PAKE) protocol (joint work with Peter Ryan). The goal of that blog was to invite public scrutiny of J-PAKE. Sixteen years later, I am pleased to say that no attacks on J-PAKE have been found and that the protocol has been used in many real-world applications, e.g., Google Nest, ARM Mbed, Amazon Fire stick, Palemoon sync and Thread products.  

J-PAKE is a balanced PAKE, meaning that both sides must hold the same secret for mutual authentication. In the example of the J-PAKE-based IoT commissioning process (part of the Thread standard), a random password is generated to authenticate the key exchange process and is discarded afterwards. However, in some cases, it is desirable to store the password. For example, in a client-server application, the user knows a password, while the server stores a one-way transformation of the password. 

PAKE protocols designed for the above client-server setting are called augmented (as opposed to the balanced in the peer-to-peer setting). So far the only augmented PAKE protocol that has enjoyed wide use is SRP-6a, e.g., used in Apple’s iCloud, 1Password and Proton mail. SRP-6a Is the latest version of the 1998 SRP-3 scheme due to Wu after several revisions to address attacks. Limitations of SRP-6a are well known, including heuristic security, a lack of efficiency (due to the mandated use of a safe prime as the modulus) and no support for elliptic curve implementation. 

In 2018, an augmented PAKE scheme called OPAQUE was proposed by Jarecki, Krawczyk and Xu. In 2020, IETF selected OPAQUE as a candidate for standardization. A theoretical advantage promoted in favour of OPAQUE is the so-called pre-computation security. When the server is compromised, an offline dictionary attack to uncover the plaintext password is possible for both OPAQUE and SRP-6a. For OPAQUE, its pre-computation security means that the attacker can’t use a pre-computed table, whilst for SRP-6a, the attacker may use a pre-computed table, but it must be a unique table for each user, which requires a large amount of computation and storage. Therefore, the practical advantage provided by pre-computation security is limited. 

Apart from pre-computation security, OPAQUE has a few open issues which leave it unclear whether it will replace SRP-6a in practice. First, the original OPAQUE protocol defined in the 2018 paper leaks password update information to passive attackers, whilst SRP-6a doesn’t have this leakage. Furthermore, OPAQUE relies on a constant-time hash-to-curve function available for all elliptic curves, but details about the instantiation of this function remain to be established. Finally, the 2018 paper didn’t give a full specification of OPAQUE. In 2020, when OPAQUE was selected by IETF, its specification remained incomplete. The task of completing the spec was left as a post-selection exercise; today, it is still not finished.  

Motivated by the recognised limitations of SRP-6a and OPAQUE, we propose a new augmented PAKE scheme called Owl (joint work with Samiran Bag, Liqun Chen and Paul van Oorschot). Owl is obtained by efficiently adapting J-PAKE to an augmented setting, providing the augmented security against server compromise with yet lower computation than J-PAKE. To the best of our knowledge, Owl is the first augmented PAKE solution that provides systematic advantages over SRP-6a in terms of security, computation, round efficiency, message sizes, and cryptographic agility. 

On 5 March 2024, I gave a presentation on Owl at Financial Cryptography and Data Security 2024 (FC’24) in Curacao. The purpose of this blog is to invite public scrunity of Owl. See the Owl paper and the FC slides for further details. An open-source Java program that shows how Owl works in an elliptic curve setting is freely available. We hope security researchers and developers will find Owl useful, especially in password-based client-server settings where a PKI is unavailable (hence TLS doesn’t apply). Same as J-PAKE, Owl is not patented and is free to use.

How hate sites evade the censor

On Tuesday we had a seminar from Liz Fong-Jones entitled “Reverse engineering hate” about how she, and a dozen colleagues, have been working to take down a hate speech forum called Kiwi Farms. We already published a measurement study of their campaign, which forced the site offline repeatedly in 2022. As a result of that paper, Liz contacted us and this week she told us the inside story.

The forum in question specialises in personal attacks, and many of their targets are transgender. Their tactics include doxxing their victims, trawling their online presence for material that is incriminating or can be misrepresented as such, putting doctored photos online, and making malicious complaints to victims’ employers and landlords. They describe this as “milking people for laughs”. After a transgender activist in Canada was swatted, about a dozen volunteers got together to try to take the site down. They did this by complaining to the site’s service providers and by civil litigation.

This case study is perhaps useful for the UK, where the recent Online Safety Bill empowers Ofcom to do just this – to use injunctions in the civil courts to take down unpleasant websites.

The Kiwi Farms operator has for many months resisted the activists by buying the services required to keep his website up, including his data centre floor space, his transit, his AS, his DNS service and his DDoS protection, through a multitude of changing shell companies. The current takedown mechanisms require a complainant to first contact the site operator; he publishes complaints, so his followers can heap abuse on them. The takedown crew then has to work up a chain of suppliers. Their processes are usually designed to stall complainants, so that getting through to a Tier 1 and getting them to block a link takes weeks rather than days. And this assumes that the takedown crew includes experienced sysadmins who can talk the language of the service providers, to whose technical people they often have direct access; without that, it would take months rather than weeks. The net effect is that it took a dozen volunteers thousands of hours over six months from October 22 to April 23 to get all the Tier 1s to drop KF, and over $100,000 in legal costs. If the bureaucrats at Ofcom are going to do this work for a living, without the skills and access of Liz and her team, it could be harder work than they think.

Liz’s seminar slides are here.

Hacktivism, in Ukraine and Gaza

People who write about cyber-conflict often talk of hacktivists and other civilian volunteers who contribute in various ways to a cause. Might the tools and techniques of cybercrime enable its practitioners to be effective auxiliaries in a real conflict? Might they fall foul of the laws of war, and become unlawful combatants?

We have now measured hacktivism in two wars – in Ukraine and Gaza – and found that its effects appear to be minor and transient in both cases.

In the case of Ukraine, hackers supporting Ukraine attacked Russian websites after the invasion, followed by Russian hackers returning the compliment. The tools they use, such as web defacement and DDoS, can be measured reasonably well using resources we have developed at the Cambridge Cybercrime Centre. The effects were largely trivial, expressing solidarity and sympathy rather than making any persistent contribution to the conflict. Their interest in the conflict dropped off rapidly.

In Gaza, we see the same pattern. After Hamas attacked Israel and Israel declared war, there was a surge of attacks that peaked after a few days, with most targets being strategically unimportant. In both cases, discussion on underground cybercrime forums tailed off after a week. The main difference is that the hacktivism against Israel is one-sided; supporters of Palestine have attacked Israeli websites, but the number of attacks on Palestinian websites has been trivial.

Extending transparency, and happy birthday to the archive

I was delighted by two essays by Anton Howes on The Replication Crisis in History Open History. We computerists have long had an open culture: we make our publications open, as well as sharing the software we write and the data we analyse. My work on security economics and security psychology has taught me that this culture is not yet as well-developed in the social sciences. Yet we do what we can. Although we can’t have official conference proceedings for the Workshop on the Economics of Information Security – as then the economists would not be able to publish their papers in journals afterwards – we found a workable compromise by linking preprints from the website and from a liveblog. Economists and psychologists with whom we work have found their citation counts and h-indices boosted by our publicity mechanisms; they have incentives to learn.

A second benefit of transparency is reproducibility, the focus of Anton’s essay. Scholars are exposed to many temptations, which vary by subject matter, but are more tempting when it’s hard for others to check your work. Mathematical proofs should be clear and elegant but are all too often opaque or misleading; software should be open-sourced for others to play with; and we do what we can to share the data we collect for research on cybercrime and abuse.

Anton describes how more and more history books are found to have weak foundations, where historians quote things out of context, ignore contrary evidence, and elaborate myths and false facts into misleading stories that persist for decades. How can history correct itself more quickly? The answer, he argues, is Open History: making as many sources publicly available as possible, just like we computerists do.

As it happens, I scanned a number of old music manuscripts years ago to help other traditional music enthusiasts, but how can this be done at scale? One way forward comes from my college’s Archives Centre, which holds the personal papers of Sir Winston Churchill as well as other politicians and a number of eminent scientists. There the algorithm is that when someone requests a document, it’s also scanned and put online; so anything Alice looked at, Bob can look at too. This has raised some interesting technical problems around indexing and long-term archiving which I believe we have under control now, and I’m pleased to say that the Archives Centre is now celebrating its 50th anniversary.

It would also be helpful if old history books were as available online as they are in our library. Given that the purpose of copyright law is to maximise the amount of material that’s eventually available to all, I believe we should change the law to make continued copyright conditional on open access after an initial commercial period. Otherwise our historians’ output vanishes from the time that their books come off sale, to the time copyright expires maybe a century later.

My own Security Engineering book may show the way. With both the first edition in 2001 and the second edition in 2008, I put six chapters online for free at once, then released the others four years after publication. For the third edition, I negotiated an agreement with the publishers to put the chapters online for review as I wrote them. So the book came out by instalments, like Dickens’ novels, from April 2019 to September 2020. On the first of November 2020, all except seven sample chapters disappeared from this page for a period of 42 months; I’m afraid Wiley insisted on that. But after that, the whole book will be free online forever.

This also makes commercial sense. For both the 2001 and 2008 editions, paid-for sales of paper copies increased significantly after the whole book went online. People found my book online, liked what they saw, and then bought a paper copy rather than just downloading it all and printing out a thousand-odd pages. Open access after an exclusive period works for authors, for publishers and for history. It should be the norm.

How to Spread Disinformation with Unicode

There are many different ways to represent the same text in Unicode. We’ve previously exploited this encoding-visualization gap to craft imperceptible adversarial examples against text-based machine learning systems and invisible vulnerabilities in source code.

In our latest paper, we demonstrate another attack that exploits the same technique to target Google Search, Bing’s GPT-4-powered chatbot, and other text-based information retrieval systems.

Consider a snake-oil salesman trying to promote a bogus drug on social media. Sensible users would do a search on the alleged remedy before ordering it, and sites containing false information would normally be drowned out by genuine medical sources in modern search engine rankings. 

But what if our huckster uses a rare Unicode encoding to replace one character in the drug’s name on social media? If a user pastes this string into a search engine, it will throw up web pages with the same encoding. What’s more, these pages are very unlikely to appear in innocent queries.

The upshot is that an adversary who can manipulate a user into copying and pasting a string into a search engine can control the results seen by that user. They can hide such poisoned pages from regulators and others who are unaware of the magic encoding. These techniques can empower propagandists to convince victims that search engines validate their disinformation.