How does an illicit cybercrime market evolve: A longitudinal study

Online underground marketplaces are an essential part of the cybercrime economy. They often act as a cash-out market, enabling the trade in illicit goods and services between pseudonymous members. To understand their characteristics, previous research mostly uses vendor ratings, public feedback, sometimes private messages, friend status, and post content. However, most research lacks comprehensive (and important) data about transactions made by the forum members.

Our recent paper (original talk here) published at the Internet Measurement Conference (IMC’20) examines how an online illicit marketplace evolves over time (especially its performance as an infrastructure for trust), including a significant shift through the COVID-19 pandemic. This study draws insights from a novel, rich and powerful dataset containing hundreds of thousands contractual transactions made by members of HackForums — the most popular online cybercrime community. The data includes a two-year historical record of the contract system, originally adopted in June 2018 as an attempt to mitigate scams and frauds occurring between untrusted parties. As well as contractual arrangements, the dataset includes thousands of associated members, threads, posts on the forum, which provide additional context. To study the longitudinal maturation of this marketplace, we split the timespan into three eras: Set-up, Stable, and COVID-19. These eras are defined by two important external milestones: the enforcement of the new forum’s policy in March 2019, and the declaration of the global pandemic in March 2020.

We applied a range of analysis and statistical modelling approaches to outline the maturation of economic and social characteristics of the market since the day it was introduced. We find the market has centralised over time, with a small proportion of ‘power users’ involved in the majority of transactions. In term of trading activities, currency exchange and payments account for the largest proportion of both contracts and users involved, followed by giftcards and accounts/licenses. The other popular products include automated bots, hacking tutorials, remote access tools (RATs), and eWhoring packs. Contracts are settled faster over time, with the completion time dropping from around 70 hours in the early months to less than 10 hours during the COVID-19 Era in June 2020.

We quantitatively estimate a lower bound total trading value of over 6 million USD for public and private transactions. With regards to payment methods preferably used within the market, Bitcoin and PayPal dominate the others at all times in terms of both trading values and number of contracts involved. A subset of new members joining the market face the ‘cold start’ problem, which refers to the difficulties of how to establish and build up a reputation base while initially having no reputation. We find that the majority of these build up their profile by participating in low-level currency exchanges, while some instead establish themselves by offering products and services.

To examine the behaviours of members over time, we use Latent Transition Analysis to discover hidden groups among the forum’s members, including how members move between groups and how they change across the lifetime of the market. In the Set-up Era, we see users gradually shift to the new system with a large number of ‘small scale’ users involved in one-off transactions, and few ‘power-users’. In the Stable Era, we see a shift in the composition and scale of the market when contracts become compulsory, with a growth of ‘business-to-consumer’ trades by ‘power-users’. In the COVID-19 Era, the market further concentrates around already-existing ‘power-users’, who are party to multiple transactions with others.

Overall, the marketplace provides a range of trust capabilities to facilitate trade between pseudonymous parties with the control is becoming further centralised with administrators acting as third-party arbitrators. The platform is clearly being used as a cash-out market, with most trades involving the exchange of currencies. In term of the three eras, the big picture shows two significant rises in the market’s activities in response to two major events that happened at the beginning of Stable and COVID-19 eras. Particularly, we observe a stimulus (rather than transformation) in trading activities during the pandemic: the same kinds of transactions, users, and behaviours, but at increased volumes. By looking at the context of forum posts at that time, we see a period of mass boredom and economic change, when some members are no longer at school while others have become unemployed or are unable to go to work. A need to make money and the availability of time in their hand to do so may be a factor resulting in the increase of trading activities seen at this time.

Some limitations of our dataset include no ground truth verification, in which we have no way to verify if transactions actually proceed as set out in the contractual agreements. Furthermore, the dataset contains a large number of private contracts (around 88%), in which we only can observe minimal information. The dataset is available to academic researchers through the Cambridge Cybercrime Center‘s data-sharing agreements.

Three paper Thursday: COVID-19 and cybercrime

For a slightly different Three Paper Thursday, I’m pulling together some of the work done by our Centre and others around the COVID-19 pandemic and how it, and government responses to it, are reshaping the cybercrime landscape. 

The first thing to note is that there appears to be a nascent academic consensus emerging that the pandemic, or more accurately, lockdowns and social distancing, have indeed substantially changed the topology of crime in contemporary societies, leading to an increase in cybercrime and online fraud. The second is that this large-scale increase in cybercrime appears to be the result of a growth in existing cybercrime phenomena rather than the emergence of qualitatively new exploits, scams, attacks, or crimes. This invites reconsideration not only of our understandings of cybercrime and its relation to space, time, and materiality, but additionally to our understandings of what to do about it.

Continue reading Three paper Thursday: COVID-19 and cybercrime

Three Paper Thursday: Applying natural language processing to underground forums

Underground forums contain discussions and advertisements of various topics, including general chatter, hacking tutorials, and sales of items on marketplaces. While off-the-shelf natural language processing (NLP) techniques may be applied in this domain, they are often trained on standard corpora such as news articles and Wikipedia. 

It isn’t clear how well these models perform with the noisy text data found on underground forums, which contains evolving domain-specific lexicon, misspellings, slang, jargon, and acronyms. I explored this problem with colleagues from the Cambridge Cybercrime Centre and the Computer Laboratory, in developing a tool for detecting bursty trending topics using a Bayesian approach of log-odds. The approach uses a prior distribution to detect change in the vocabulary used in forums, for filtering out consistently used jargon and slang. The paper has been accepted to the 2020 Workshop on Noisy User-Generated Text (ACL) and the preprint is available online.

Other more commonly used approaches of identifying known and emerging trends range from simple keyword detection using a dictionary of known terms, to statistical methods of topic modelling including TF-IDF and Latent Dirichlet Allocation (LDA). In addition, the NLP landscape has been changing over the last decade [1], with a shift to deep learning using neural models, such as word2vec and BERT.

In this Three Paper Thursday, we look at how past papers have used different NLP approaches to analyse posts in underground forums, from statistical techniques to word embeddings, for identifying and define new terms, generating relevant warnings even when the jargon is unknown, and identifying similar threads despite relevant keywords not being known.

[1] Gregory Goth. 2016. Deep or shallow, NLP is breaking out. Commun. ACM 59, 3 (March 2016), 13–16. DOI:https://doi.org/10.1145/2874915

Continue reading Three Paper Thursday: Applying natural language processing to underground forums

Three Paper Thursday: Broken Hearts and Empty Wallets

This is a guest post by Cassandra Cross.

Romance fraud (also known as romance scams or sweetheart swindles) affects millions of individuals globally each year. In 2019, the Internet Crime Complaint Centre (IC3) (USA) had over US$475 million reported lost to romance fraud. Similarly, in Australia, victims reported losing over $AUD80 million and British citizens reported over £50 million lost in 2018. Given the known under-reporting of fraud overall, and online fraud more specifically, these figures are likely to only be a minority of actual losses incurred.

Romance fraud occurs when an offender uses the guise of a legitimate relationship to gain a financial advantage from their victim. It differs from a bad relationship, in that from the outset, the offender is using lies and deception to obtain monetary rewards from their partner. Romance fraud capitalises on the fact that a potential victim is looking to establish a relationship and exhibits an express desire to connect with someone. Offenders use this to initiate a connection and start to build strong levels of trust and rapport.

As with all fraud, victims experience a wide range of impacts in the aftermath of victimisation. While many believe these to be only financial, in reality, it extends to a decline in both physical and emotional wellbeing, relationship breakdown, unemployment, homelessness, and in extreme cases, suicide. In the case of romance fraud, there is the additional trauma associated with grieving both the loss of the relationship as well as any funds they have transferred. For many victims, the loss of the relationship can be harder to cope with than the monetary aspect, with victims experiencing large degrees of betrayal and violation at the hands of their offender.

Sadly, there is also a large amount of victim blaming that exists with both romance fraud and fraud in general. Fraud is unique in that victims actively participate in the offence, through the transfer of money, albeit under false pretences. As a result, they are seen to be culpable for what occurs and are often blamed for their own circumstances. The stereotype of fraud victims as greedy, gullible and naïve persists, and presents as a barrier to disclosure as well as inhibiting their ability to report the incident and access any support services.

Given the magnitude of losses and impacts on romance fraud victims, there is an emerging body of scholarship that seeks to better understand the ways in which offenders are able to successfully target victims, the ways in which they are able to perpetrate their offences, and the impacts of victimisation on the individuals themselves. The following three articles each explore different aspects of romance fraud, to gain a more holistic understanding of this crime type.

Continue reading Three Paper Thursday: Broken Hearts and Empty Wallets

Hiring for the Cambridge Cybercrime Centre (Sep 2020 version)

We have yet another “post-doc” position in the Cambridge Cybercrime Centre: https://www.cambridgecybercrime.uk (for the happy reason that Ben is leaving us to become a Lecturer in Digital Methods in Edinburgh).

Hence, once again, we are looking for an enthusiastic researcher to join us to work on our datasets of cybercrime activity, collecting new types of data, maintaining existing datasets and doing innovative research using our data. The person we appoint will define their own goals and objectives and pursue them independently, or as part of a team.

We are specifically interested in determining how cybercrime has changed in response the COVID-19 pandemic and our funding requires us to identify new trends, to collect (and share) relevant data, and to rapidly provide an analysis of what is happening, with the aim of assisting in optimising technical and policy responses. We are also expanding our data collection into examining the online activities of extremist groups — with a specific focus on pandemic related issues.

An ideal candidate would identify datasets that can be collected, build the collection systems and then do cutting edge research on this data – whilst encouraging other academics to take our data and make their own contributions to the field. However, we recognise that candidates may be from a technical background and hence stronger at the collecting side, or from a social science background and hence stronger on providing compelling insights into what our data reveals. Along with a CV we expect to see a covering letter which sets out what type of research might be done and the skills which will be brought to bear, along with an indication where help would need to be sought from colleagues in our interdisciplinary environment.

Please follow this link to the advert to read the formal advertisement for the details about exactly who and what we’re looking for and how to apply — and please pay special attention to our request for a covering letter.

Job ad: Research Assistants/Associates in Compilers or Operating Systems for CHERI and the Arm Morello Board

We are pleased to announce two new research and/or software-development posts contributing to the CHERI project and Arm’s forthcoming Morello prototype processor, SoC, and development board. Learn more about CHERI and Morello on our project web site.

Fixed-term: The funds for this post are available for up to 2 years, with the possibility of extension as grant funds permit.

Research Assistant: £26,715 – £30,942 or Research Associate: £32,816 – £40,322

http://www.jobs.cam.ac.uk/job/26834/

We are seeking one or more Research Assistants (without PhD) or Research Associates (holding or shortly to obtain a PhD) with a strong background in compilers and/or operating systems to contribute to the CHERI Project and our joint work with Arm on their prototype Morello board, which incorporates CHERI into a high-end superscalar ARMv8-A processor. CHERI is a highly successful collaboration between the University of Cambridge, SRI International, and ARM Research to develop new architectural security primitives. The CHERI protection model extends off-the-shelf processor Instruction-Set Architectures (ISAs) and processors with new capability-based security primitives supporting fine-grained C/C++-language memory protection and scalable software compartmentalization.

A measurement of link rot: 57%

I submitted my PhD on the 31st August 2005 (9 months before Twitter started, almost two years before the first iPhone). The easiest version to find (click here) contains the minor revisions requested by my examiners and some typographical changes to fit it into the Computer Lab’s Technical Report series.

Since it seemed like a good idea at the time, my thesis has an annotated bibliography (so you can read a brief precis of what I referenced, which could assist you in deciding whether to follow it up). I also went to some effort to identify online versions of everything I cited, because it always helpful to just click on a link and immediately see the paper, news article or other material.

The thesis has 153 references, in two cases I provided two URLs, and in three cases I could not provide any URL — though I did note that the three ITU standards documents I cited were available from the ITU bookshop and it was possible to download a small number of standards without charge. That is, the bibliography contained 152 URLs.
Continue reading A measurement of link rot: 57%

Of testing centres, snipe, and wild geese: COVID briefing paper #8

Does the road wind up-hill all the way?
   Yes, to the very end.
Will the day's journey take the whole long day?
   From morn to night, my friend.

Christina Rossetti, 1861: Up-Hill. 

This week’s COVID briefing paper takes a personal perspective as I recount my many adventures in complying with a call for testing from my local council.

So as to immerse the reader in the experience, this post is long. If you don’t have time for that, you can go directly to the briefing.

The council calls for everyone in my street to be tested

On Thursday 13 August my household received a hand-delivered letter from the chief executive of my local council. There had been an increase in cases in my area, and as a result, they were asking everyone on my street to get tested.

Dramatis personae:

  • ME, a knowledge worker who has structured her life so as to minimize interaction with the outside world until the number of daily cases drops a lot lower than it is now;
  • OTHER HOUSEHOLD MEMBERS, including people with health conditions, who would be shielding if shielding hadn’t ended on August 1.

Fortunately, everyone else in my household is also in a position to enjoy the mixed blessing of a lifestyle without social interaction. So, none of us reacted to the news of an outbreak amongst our neighbours with fear for our own health, considering our habits over the last six months. Rather, we were, and are, reassured that the local government was taking a lead.

My neighbour, however, was having a different experience. Like most people on our street, he does not have the same privileges I do: he works in a supermarket, he does not have a car, and his only Internet access is through his dumbphone. Days before, he had texted me at the end of his tether, because customers were not wearing masks or observing social distancing. He felt (because he is) unprotected, and said it was only a matter of time before he becomes infected. Receiving the council’s letter only reinforced his alarm.

Booking the tests

Continue reading Of testing centres, snipe, and wild geese: COVID briefing paper #8

Cambridge Cybercrime Centre: COVID briefing papers

The current coronavirus pandemic has significantly disrupted all our societies and, we believe, it has also significantly disrupted cybercrime.

In the Cambridge Cybercrime Centre we collect crime-related datasets of many types and we expect, in due course, to be able to identify, measure and document this disruption. We will not be alone in doing so — a key aim of our centre is to make datasets available to other academic researchers so that they too can identify, measure and document. What’s more, we make this data available in a timely manner — sometimes before we have even looked at it ourselves!

When we have looked at the data and identified what might be changing (or where the criminals are exploiting new opportunities) then we shall of course be taking the traditional academic path of preparing papers, getting them peer reviewed, and then presenting them at conferences or publishing them in academic journals. However, that process is extremely slow — so we have decided to provide a faster route for getting out the message about what we find to be going on.

Our new series of “COVID Briefing Papers” are an ongoing series of short-form, open access reports aimed at academics, policymakers, and practitioners, which aim to provide an accessible summary of our ongoing research into the effects which the coronavirus pandemic (and government responses) are having on cybercrime. We’re hoping, at least for a while, to produce a new briefing paper each week … and you can now read the very first, where Ben Collier explains what has happened to illegal online drug markets… just click here!

Reinforcement Learning and Adversarial thinking

We all know that learning a new craft is hard. We spend a large part of our lives learning how to operate in everyday physics.  A large part of this learning comes from observing others, and when others can’t help we learn through trial and error. 

In machine learning the process of learning how to deal with the environment is called Reinforcement Learning (RL). By continuous interaction with its environment, an agent learns a policy that enables it to perform better. Observational learning in RL is referred to as Imitation Learning. Both trial and error and imitation learning are hard: environments are not trivial, often you can’t tell the ramifications of an action until far in the future, environments are full of non-determinism and there are no such thing as a correct policy. 

So, unlike in supervised and unsupervised learning, it is hard to tell if your decisions are correct. Episodes usually constitute thousands of decisions, and you will only know if you perform well after exploring other options. But experiment is also a hard decision: do you exploit the skill you already have, or try something new and explore the unknown?

Despite all these complexities, RL has managed to achieve incredible performance in a wide variety of tasks from robotics through recommender systems to trading. More impressively, RL agents have achieved superhuman performance in Go and other games, tasks previously believed to be impossible for computers. 

Continue reading Reinforcement Learning and Adversarial thinking