All posts by Joseph Bonneau

The science of password guessing

I’ve written quite a few posts about passwords, mainly focusing on poor implementations, bugs and leaks from large websites. I’ve also written on the difficulty of guessing PINs, multi-word phrases and personal knowledge questions. How hard are passwords to guess? How does guessing difficulty compare between different groups of users? How does it compare to potential replacement technologies? I’ve been working on the answers to these questions for much of the past two years, culminating in my PhD dissertation on the subject and a new paper at this year’s IEEE Symposium on Security and Privacy (Oakland) which I presented yesterday. My approach is simple: don’t assume any semantic model for the distribution of passwords (Markov models and probabilistic context-free-grammars have been proposed, amongst others), but instead learn the distribution of passwords with lots of data and use this to estimate the efficiency of an hypothetical guesser with perfect knowledge. It’s been a long effort requiring new mathematical techniques and the largest corpus of passwords ever collected for research. My results provide some new insight on the nature of password selection and a good framework for future research on authentication using human-chosen distributions of secrets. Continue reading The science of password guessing

Three Paper Thursday: Financial Crypto 2012

I spent last week attending Financial Cryptography on Bonaire (a small Dutch island in the Caribbean), along with its attached workshops on Ethics in Computer Security Research and Usable Security.  As usual, the conference attracted a broad spectrum of papers mixing applied cryptography and miscellaneous financial security problems (including our own group’s work on PIN guessing statistics and Facebook’s photo-based backup authentication). All of the papers are now online. I’ll point to three papers which thought-provoking for me. I’m not going to claim these are the best or most important papers-the conference featured some very strong work on applying cryptography to practical problems like smart metering and oblivous printing, while perhaps the most newsworthy research was Wustrow et al.’s hacking of the Washington DC Internet voting prototype. I’ll just highlight why these papers were memorable for me. Continue reading Three Paper Thursday: Financial Crypto 2012

Some evidence on multi-word passphrases

Using a multi-word “passphrase” instead of a password has been suggested for decades as a way to thwart guessing attacks. The idea is now making a comeback, for example with the Fastwords proposal which identifies that mobile phones are optimised for entering dictionary words and not random character strings. Google’s recent password advice suggests condensing a sentence to form a password, while Komanduri et al.’s recent lab study suggests simply requiring longer passwords may be the best security policy. Even xkcd espouses multi-word passwords (albeit with randomly-chosen words). I’ve been advocating through my research though that authentication schemes can only be evaluated by studying large user-chosens distribution in the wild and not the theoretical space of choices. There’s no public data on how people choose passphrases, though Kuo et al.’s 2006 study for mnemonic-phrase passwords found many weak choices. In my recent paper (written with Ekaterina Shutova) presented at USEC last Friday (a workshop co-located with Financial Crypto), we study the problem using data crawled from the now-defunct Amazon PayPhrase system, introduced last year for US users only. Our goal wasn’t to evaluate the security of the scheme as deployed by Amazon, but learn more how people choose passphrases in general. While this is a relatively limited data source, our results suggest some caution on this approach. Continue reading Some evidence on multi-word passphrases

How hard are PINs to guess?

Note: this research was also blogged today at the NY Times’ Bits technology blog.

I’ve personally been researching password statistics for a few years now (as well as personal knowledge questions) and our research group has a long history of research on banking security. In an upcoming paper at next weel’s Financial Cryptography conference written with Sören Preibusch and Ross Anderson, we’ve brought the two research threads together with the first-ever quantitative analysis of the difficulty of guessing 4-digit banking PINs. Somewhat amazingly given the importance of PINs and their entrenchment in infrastructure around the world, there’s never been an academic study of how people actually choose them. After modeling banking PIN selection using a combination of leaked data from non-banking sources and a massive online survey, we found that people are significantly more careful choosing PINs then online passwords, with a majority using an effectively random sequence of digits. Still, the persistence of a few weak choices and birthdates in particular suggests that guessing attacks may be worthwhile for an opportunistic thief. Continue reading How hard are PINs to guess?

Blood donation and privacy

The UK’s National Blood Service screens all donors for a variety of health and lifestyle risks prior donation. Many are highly sensitive, particularly sexual history and drug use. So I found it disappointing that, after consulting with a nurse who took detailed notes about specific behaviours and when they occurred, I was expected to consent to this information being stored indefinitely. When I pressed as to why this data is retained, I was told it was necessary so that I can be contacted as soon as I’m eligible again to donate blood, and to prevent me from donating before that.

The first reason seems weak, as contacting donors on an annual or semi-annual basis wouldn’t greatly decrease the level of donation (most risk-factor restrictions last at least 12 months or are indefinite). The second reason is a security fantasy, as it would only detect donors who lie at a second visit after being honest initially. I doubt donor dishonesty is a major problem and all blood is tested anyway. The purpose of lifestyle restrictions is to reduce the base rate of unsafe blood because all tests have false negatives. Storing detailed donor history doesn’t even have much time-saving benefit: history needs to be re-taken before each donation, since lifestyle risks can change.

I certainly don’t think the NBS is trying to stockpile data for nefarious reasons. I expect instead that the increasingly low technical costs of storing data speciously justify its very minor secondary uses if one ignores the risk of a massive compromise (NBS gets about 2 M donors per year). I wonder whether the inherent hazard of data collection was considered in the NBS’ cost/benefit analysis when this privacy policy was adopted . Security engineers and privacy advocates would do well to advocate non-collection of sensitive data before fancier privacy-enhancing technology. The NHS provides a vital service but they can’t do it without their donors, who are always in short supply. It would be a shame to discourage anybody from donating and being honest about their health history by demanding to store their data forever.

Want to create a really strong password? Don’t ask Google

Google recently launched a major advertising campaign around its “Good to Know” guides to online safety and privacy. Google’s password advice has appeared on billboards in the London underground and a full-page ad in The Economist. Their example of a “very strong password” is ‘2bon2btitq’, taken from the famous Hamlet quote “To be or not to be, that is the question”.
Empirically though, this is not a strong password-it’s almost exactly average! Continue reading Want to create a really strong password? Don’t ask Google

Randomly-generated passwords at myBART

Last week, in retaliation against the heavy-handed response to planned protests against the BART metro system in California, the hacktivist group Anonymous hacked into several BART servers. They leaked part of a database of users from myBART, a website which provides frequent BART riders with email updates about activities near BART stations. An interesting aspect of the leak is that 1,346 of the 2,002 accounts seem to have randomly-generated passwords-a rare opportunity to study this approach to password security. Continue reading Randomly-generated passwords at myBART

The Sony hack: passwords vs. financial details

Sometime last week, Sony discovered that up to 77 M accounts on its PlayStation Network were compromised. Sony’s network was down for a week before they finally disclosed details yesterday. Unusually, there haven’t yet been any credible claims of responsibility for the hack, so we can only go on Sony’s official statements. The breach included names and addresses, passwords, and answers to personal knowledge questions, and possibly payment details. The risks of leaking payment card numbers are well-known, including fraudulent payment transactions and identity theft. Sony has responded by offering to provide free credit checks for affected customers and notifying major credit ratings bureaus with a list of affected customers. This hasn’t been enough for many critics, including a US Senator.

Still, this is far more than Sony has done regarding the leaked passwords. The risks here are very real—hackers can attempt to re-use the compromised passwords (possibly after inverting hashes using brute-force) at many other websites, including financial ones. There are no disclosure laws here though, and Sony has done nothing, not even disclosing the key technical details of how passwords were stored. The implications are very different if the passwords were stored in cleartext, hashed in a constant manner, or properly hashed and salted. Sony customers ought to know what really happened. Instead, towards the bottom of Sony’s FAQ they trail off mid sentence when discussing the leaked passwords:

Additionally, if you use the same user name or password for your PlayStation Network or Qriocity service account for other [no further text]

As we explored last summer, this is a serious market failure. Sony’s security breach has potentially compromised passwords at hundreds of other sites where its users re-use the same password and email address as credentials. This is a significant externality, but Sony bears no legal responsibility, and it shows. The options are never great once a breach has occurred, but Sony should at a minimum have promptly provided full details about their password storage, gave clear instructions to users to change their password at other sites, and notified at least the email providers of each account holder to instruct a forced password reset. The legal framework surrounding password breaches must catch up to that for financial breaches.

Measuring password re-use empirically

In the aftermath of Anonymous’ revenge hacking of HBGary over the weekend, some enterprising hackers used one of the stolen credentials and some social engineering to gain root access at, which has been down for a few days since. There isn’t much novel about the hack but the dump of’s SQL databases provides another password dataset for research, though an order of magnitude smaller than the Gawker dataset with just 81,000 hashed passwords.

More interestingly, due to the close proximity of the hacks, we can compare the passwords associated with email addresses registered at both Gawker and This gives an interesting data point on the widely known problem of password re-use. This new data seems to indicate a significantly higher re-use rate than the few previously published estimates. Continue reading Measuring password re-use empirically

Another Gawker bug: handling non-ASCII characters in passwords

A few weeks ago I detailed how Gawker lost a million of their users’ passwords. Soon after this I found an interesting vulnerability in Gawker’s password deployment involving the handling of non-ASCII characters. Specifically, they didn’t handle them at all until two weeks ago, instead they were mapping all non-ASCII characters to the ASCII ‘?’ prior to hashing them. This not only greatly limited the theoretical space of passwords, but meant that passwords consisting of any n non-ASCII characters were equivalent to ‘?’^n. Native Telugu or Korean speakers with passwords like ‘రహస్య సంకేత పదం’ or ‘비밀번호’ were vulnerable to an attacker simply guessing a string of question marks. An attacker may in fact know in advance that some users are from non-Latin countries (for example by looking at their email addresses) potentially making this more easily exploitable.

Continue reading Another Gawker bug: handling non-ASCII characters in passwords