Category Archives: Academic papers

Non-cooperation in the fight against phishing

Tyler Moore and I are presenting another one of our academic phishing papers today at the Anti-Phishing Working Group’s Third eCrime Researchers Summit here in Atlanta, Georgia. The paper “The consequence of non-cooperation in the fight against phishing” (pre-proceedings version here) goes some way to explaining anomalies we found in our previous analysis of phishing website lifetimes. The “take-down” companies reckon to get phishing websites removed within a few hours, whereas our measurements show that the average lifetimes are a few days.

These “take-down” companies are generally specialist offshoots of more general “brand protection” companies, and are hired by banks to handle removal of fake phishing websites.

When we examined our data more carefully we found that we were receiving “feeds” of phishing website URLs from several different sources — and the “take-down” companies that were passing the data to us were not passing the data to each other.

So it often occurs that take-down company A knows about a phishing website targeting a particular bank, but take-down company B is ignorant of its existence. If it is company B that has the contract for removing sites for that bank then, since they don’t know the website exists, they take no action and the site stays up.

Since we were receiving data feeds from both company A and company B, we knew the site existed and we measured its lifetime — which is much extended. In fact, it’s somewhat of a mystery why it is removed at all! Our best guess is that reports made directly to ISPs trigger removal.

The paper contains all the details, and gives all the figures to show that website lifetimes are extended by about 5 days when the take-down company is completely unaware of the site. On other occasions the company learns about the site some time after it is first detected by someone else; and this extends the lifetimes by an average of 2 days.

Since extended lifetimes equate to more unsuspecting visitors handing over their credentials and having their bank accounts cleaned out, these delays can also be expressed in monetary terms. Using the rough and ready model we developed last year, we estimate that an extra $326 million per annum is currently being put at risk by the lack of data sharing. This figure is from our analysis of just two companies’ feeds, and there are several more such companies in this business.

Not surprisingly, our paper suggests that the take-down companies should be sharing their data, so that when they learn about websites attacking banks they don’t have contracts with, they pass the details on to another company who can start to get the site removed.

We analyse the incentives to make this change (and the incentives the companies have not to do so) and contrast the current arrangements with the anti-virus/malware industry — where sample suspect code has been shared since the early 1990s.

In particular, we note that it is the banks who would benefit most from data sharing — and since they are paying the bills, we think that they may well be in a position to force through changes in policy. To best protect the public, we must hope that this happens soon.

Privacy Enhancing Technologies Symposium (PETS 2009)

I am on the program committee for the 9th Privacy Enhancing Technologies Symposium (PETS 2009), to be held in Seattle, WA, USA, 5–7 August 2009. PETS is the leading venue for research on privacy and anonymity, offering an enjoyable environment and stimulating discussion. If you are working in this field, I can strongly recommend submitting a paper.

This year, we are particularly looking for submissions from topics other than anonymous communications, so if work from your field may be applied, or is otherwise related, to the topic of privacy, I’d encourage you to consider PETS as a potential venue.

The submission deadline for the main session is 2 March 2009. As with last year, we will also have a “HotPETS” event, for new and exciting work in the field which is still in a formative state. Submissions for HotPETS should be received by 8 May 2009.

Further information can be found in the call for papers.

An A to Z of confusion

A few days ago I blogged about my paper on email spam volumes — comparing “aardvarks” (email local parts [left of the @] beginning with “A”) with “zebras” (those starting with a “Z”).

I observed that provided one considered “real” aardvarks and zebras — addresses that received good email amongst the spam — then aardvarks got 35% spam and zebras a mere 20%.

This has been widely picked up, first in the Guardian, and later in many other papers as well (even in Danish). However, many of these articles have got hold of the wrong end of the stick. So besides mentioning A and Z, it looks as if I should have published this figure from the paper as well…

Figure 3 from the academic paper

… the point being that the effect I am describing has little to do with Z being at the end of the alphabet, and A at the front, but seems to be connected to the relative rarity of zebras.

As you can see from the figure, marmosets and pelicans get around 42% spam (M and P being popular letters for people’s names) and quaggas 21% (there are very few Quentins, just as there are very few Zacks).

There are some outliers in the figure: for example “3” relates to spammers failing to parse HTML properly and ending up with “3c” (a < character) at the start of names. However, it isn’t immediately apparent why “unicorns” get quite so much spam, it may just be a quirk of the way that I have assessed “realness”. Doubtless some future research will be able to explain this more fully.

Zebras and Aardvarks

We all know that different people get different amounts of email “spam“. Some of these differences result from how careful people have been in hiding their address from the spammers — putting it en claire on a webpage will definitely improve your chances of receiving unsolicited email.

However, it turns out there’s other effects as well. In a paper I presented last week to the Fifth Conference on Email and Anti-Spam (CEAS 2008), I showed that the first letter of the local part of the email address also plays a part.

Incoming email to Demon Internet where the email address local part (the bit left of the @) begins with “A” (think of these as aardvarks) is almost exactly 50% spam and 50% non-spam. However, where the local part begins with “Z” (zebras) then it is about 75% spam.

However, if one only considers “real” aardvarks and zebras, viz: where a particular email address was legitimate enough to receive some non-spam email, then the picture changes. If one treats an email address as “real” if there’s one non-spam email on average every second day, then real aardvarks receive 35% spam, but real zebras receive only 20% spam.

The most likely reason for these results is the prevalence of “dictionary” or “Rumpelstiltskin” attacks (where spammers guess addresses). If there are not many other zebras, then guessing zebra names is less likely.

Aardvarks should consider changing species — or asking their favourite email filter designer to think about how this unexpected empirical result can be leveraged into blocking more of their unwanted email.

[[[ ** Note that these percentages are way down from general spam rates because Demon rejects out of hand email from sites listed in the PBL (which are not expected to send email) and greylists email from sites in the ZEN list. This reduces overall volumes considerably — so YMMV! ]]]

PET Award 2008

At last year’s Privacy Enhancing Technologies Symposium (PETS), I presented the paper “Sampled Traffic Analysis by Internet-Exchange-Level Adversaries”, co-authored with Piotr Zieliński. In it, we discussed the risk of traffic-analysis at Internet exchanges (IXes). We then showed that given even a small fraction of the data passing through an IX it was still possible to track a substantial proportion of anonymous communications. Our results are summarized in a previous blog post and full details are in the paper.

Our paper has now been announced as a runner-up for the Privacy Enhancing Technologies Award. The prize is presented annually, for research which makes an outstanding contribution to the field. Microsoft, the sponsor of the award, have further details and summaries of the papers in their press release.

Congratulations to the winners, Arvind Narayanan and Vitaly Shmatikov, for “Robust De-Anonymization of Large Sparse Datasets”; and the other runner-ups, Mira Belenkiy, Melissa Chase, C. Chris Erway, John Jannotti, Alptekin Küpçü, Anna Lysyanskaya and Erich Rachlin, for “Making P2P Accountable without Losing Privacy”.

Metrics for security and performance in low-latency anonymity systems

In Tor, and in other similar anonymity systems, clients choose a random sequence of computers (nodes) to route their connections through. The intention is that, unless someone is watching the whole network at the same time, the tracks of each user’s communication will become hidden amongst that of others. Exactly how a client chooses nodes varies between system to system, and is important for security.

If someone is simultaneously watching a user’s traffic as it enters and leaves the network, it is possible to de-anonymise the communication. As anyone can contribute nodes, this could occur if the first and last node for a connection is controlled by the same person. Tor takes some steps to avoid this possibility e.g. no two computers on the same /16 network may be chosen for each connection. However, someone with access to several networks could circumvent this measure.

Not only is route selection critical for security, but it’s also a significant performance factor. Tor nodes vary dramatically in their capacity, mainly due to their network connections. If all nodes were chosen with equal likelihood, the slower ones would cripple the network. This is why Tor weights the selection probability for a node proportional to its contribution to the network bandwidth.

Because of the dual importance of route selection, there are a number of proposals which offer an alternative to Tor’s bandwidth-weighted algorithm. Later this week at PETS I’ll be presenting my paper, co-authored with Robert N.M. Watson, “Metrics for security and performance in low-latency anonymity systems”. In this paper, we examine several route selection algorithms and evaluate their security and performance.

Intuitively, a route selection algorithm which weights all nodes equally appears the most secure because an attacker can’t make their node count any more than the others. This has been formalized by two measures: Gini coefficient and entropy. In fact the reality is more complex — uniform node selection resists attackers with lots of bandwidth, whereas bandwidth-weighting is better against attackers with lots of nodes (e.g. botnets).

Our paper explores the probability of path compromise of different route selection algorithms, when under attack by a range of different adversaries. We find that none of the proposals are optimal against all adversaries, and so summarizing effective security in terms of a single figure is not feasible. We also model the performance of the schemes and show that bandwidth-weighting offers both low latency and high resistance to attack by bandwidth-constrained adversaries.

Update (2008-07-25):
The slides (PDF 2.1M) for my presentation are now online.

An improved clock-skew measurement technique for revealing hidden services

In 2006 I published a paper on remotely estimating a computer’s temperature, based on clock skew. I showed that by inducing load on a Tor hidden service, an attacker could cause measurable changes in clock skew and so allow the computer hosting the service to be re-identified. However, it takes a very long time (hours to days) to obtain a sufficiently accurate clock-skew estimate, even taking a sample every few seconds. If measurements are less granular than the 1 kHz TCP timestamp clock source I used, then it would take longer still.

This limits the attack since in many cases TCP timestamps may be unavailable. In particular, Tor hidden services operate at the TCP layer, stripping all TCP and IP headers. If an attacker wants to estimate clock skew over the hidden service channel, the only directly available clock source may be the 1 Hz HTTP timestamp. The quantization noise in this case is three orders of magnitude above the TCP timestamp case, making the approach I used in the paper effectively infeasible.

While visiting Cambridge in summer 2007, Sebastian Zander developed an improved clock skew measurement technique which would dramatically reduce the noise of clock-skew measurements from low-frequency clocks. The basic idea, shown below, is to only request timestamps very close to a clock transition, where the quantization noise is lowest. This requires the attacker to firstly lock-on to the phase of the clock, then keep tracking it even when measurements are distorted by network jitter.

Synchronized vs random sampling

Sebastian and I wrote a paper — An Improved Clock-skew Measurement Technique for Revealing Hidden Services — describing this technique, and showing results from testing it on a Tor hidden service installed on PlanetLab. The measurements show a large improvement over the original paper, with two orders of magnitude lower noise for low-frequency clocks (like the HTTP case). This approach will allow previous attacks to be executed faster, and make previously infeasible attacks possible.

The paper will be presented at the USENIX Security Symposium, San Jose, CA, US, 28 July – 1 August 2008.

Slow removal of child sexual abuse image websites

On Friday last week The Guardian ran a story on an upcoming research paper by Tyler Moore and myself which will be presented at the WEIS conference later this month. We had determined that child sexual abuse image websites were removed from the Internet far slower than any other category of content we looked at, excepting illegal pharmacies hosted on fast-flux networks; and we’re unsure if anyone is seriously trying to remove them at all!
Continue reading Slow removal of child sexual abuse image websites

"Covert channel vulnerabilities in anonymity systems" wins best thesis award

My PhD thesis “Covert channel vulnerabilities in anonymity systems” has been awarded this year’s best thesis prize by the ERCIM security and trust management working group. The announcement can be found on the working group homepage and I’ve been invited to give a talk at their upcoming workshop, STM 08, Trondheim, Norway, 16–17 June 2008.

Update 2007-07-07: ERCIM have also published a press release.

J-PAKE: From Dining Cryptographers to Jugglers

Password Authenticated Key Exchange (PAKE) is one of the central topics in cryptography. It aims to address a practical security problem: how to establish secure communication between two parties solely based on their shared password without requiring a Public Key Infrastructure (PKI).

The solution to the above problem is very useful in practice — in fact, so useful that it spawns a lot “fights” over patents. Many techniques were patented, including the well-known Encrypted Key Exchange (EKE) and Simple Password Exponential Key Exchange (SPEKE). A secondary problem is technical; both the EKE and SPEKE protocols have subtle but worrying technical limitations (see the paper for details).

At the 16th Workshop on Security Protocols held in April 2008, Cambridge, UK, I presented a new solution (joint work with Peter Ryan) called Password Authenticated Key Exchange by Juggling (or J-PAKE). The essence of the protocol design inherits from the earlier work on solving the Dining Cryptographers problem; we adapted the same juggling technique to the two-party case to solve the PAKE problem. To our best knowledge, this design is significantly different from all past PAKE solutions.

Intuitively, the J-PAKE protocol works like a juggling game between two people — if we regard a public key as a “ball”. In round one, each person throws two ephemeral public keys (“balls”) to each other. In round 2, each person combines the available public keys and the password to form a new public key, and throws the new “ball” to each other.

After round 2, the two parties can securely compute a common session key, if they supplied the same passwords. Otherwise, the protocol leaks nothing more than: “the supplied passwords at two sides are not the same”. In other words, one can prove his knowledge of the password without revealing it. A Java implementation of the protocol on a MacBook Pro laptop shows that the total computation time at each side is merely 75 ms.

We hope this protocol is of usefulness to security engineers. For example, compared with SSL/TLS, J-PAKE is potentially much more resistant against phishing attacks, not to mention that it is PKI-free. Since this protocol is the result of an academic research project, we didn’t — and have no intention to — patent it. As explained in the paper, J-PAKE even has technical advantages over the patented EKE and SPEKE in terms of security, with comparable efficiency. It has been submitted as a follow-up to the possible future extension of IEEE P1363.2.

We believe the PAKE research is important and has strong practical relevance. This post is to facilitate discussions on this subject. The paper can be viewed here. Any comments or questions are welcome.

Update

  • 2008-06-28: a crude J-PAKE demo source code (.java) by me. (link broken)
  • 2008-11-04: a more refined J-PAKE in C and OpenSSL by Ben Laurie.
  • 2008-11-11: possible applications of J-PAKE in VPN and browser by James.
  • 2009-02-08: public group parameters for 112-bit and 128-bit security can be found in the comments.
  • 2009-03-15: fixed the broken link to the old Java file. Here is the link to the Java demo code.
  • 2010-04-17: a journal version of the paper available on IACR. No technical change to the protocol.
  • 2010-10-25: the journal version of the paper is accepted to publish on the TCS journal – Springer Transactions on Computational Science, the special issue on “Security in Computing”, 2011.
  • 2010-11-25: Sebastien Martini reported an implementation issue of J-PAKE in OpenSSL and OpenSSH. The issue is not applicable to the Java demo code that I wrote. As stated in the last paragraph of p. 11 in the paper, one shall check the element lies within the specified group. Stefan Arentz implemented a fix in OpenSSL. Official OpenSSL and OpenSSH patches can be found here and here.
  • 2011-01-11: Mozilla built J-PAKE into the base product of Firefox 4 ( beta 8 and later). More details here.
  • 2012-01-18: Today, Mohsen Toorani uploadeda paper on IACR eprint to claim several attacks on J-PAKE. My response can be found here.
  • 2012-07-21: Phil Clay contributed a Java implementation of J-PAKE to bouncycastle.
  • 2013-02-24: J-PAKE included into bouncycastle 1.48.
  • 2013-03-28: a code example to show how to use J-PAKE in bouncycastle
  • 2013-05-21: Submitted two Internet Drafts to IETF (one on J-PAKE and the other one on Schnorr NIZK Proof)
  • 2013-12-30: a code example to show how to implement J-PAKE using Elliptic Curve (or ECDSA-like group setting)
  • 2014-04-17: J-PAKE included into VMware NSS Cryptographic Module
  • 2014-10-27: J-PAKE adopted by the ISO/IEC standard (11770-4) following the ISO/IEC SC27 meeting held in Mexico City, October 20-24, 2014
  • 2014-12-26: My response to Mohsen Toorani’s IEEE ISCC’14 paper “Security Analysis of J-PAKE”.
  • 2015-04-29: J-PAKE included into Python
  • 2015-05-08: Standardization of J-PAKE in ISO/IEC 11770-4 in process. The first working draft (WD1) passed reviews by ISO/IEC SC27 at Kuching Malaysia, May 5-8, 2015.
  • 2015-05-19: Here is an independent formal analysis of J-PAKE by other researchers published at Oakland’2015. Their results are consistent with the original J-PAKE paper.
  • 2015-05-30: J-PAKE included in BarcoSilex BA414E Public Key crypto engine
  • 2015-05-31: Firefox is upgrading Sync 1.1 (using J-PAKE to transfer a full-entropy AES key between sync devices) to new Sync 1.5 (using user-defined passwords as encryption keys). But Pale moon decides to stay with Sync 1.1.
  • 2015-07-28: the Thread Technical white paper is public. It describes a technique that securely enrols a new device to the network based on J-PAKE. The technique is used in Google Nest thermostat products.
  • 2017-10-06: J-PAKE published in ISO/IEC 11770-4 as an international standard and also published in RFC 8236. See here.