Some evidence on multi-word passphrases

Using a multi-word “passphrase” instead of a password has been suggested for decades as a way to thwart guessing attacks. The idea is now making a comeback, for example with the Fastwords proposal which identifies that mobile phones are optimised for entering dictionary words and not random character strings. Google’s recent password advice suggests condensing a sentence to form a password, while Komanduri et al.’s recent lab study suggests simply requiring longer passwords may be the best security policy. Even xkcd espouses multi-word passwords (albeit with randomly-chosen words). I’ve been advocating through my research though that authentication schemes can only be evaluated by studying large user-chosens distribution in the wild and not the theoretical space of choices. There’s no public data on how people choose passphrases, though Kuo et al.’s 2006 study for mnemonic-phrase passwords found many weak choices. In my recent paper (written with Ekaterina Shutova) presented at USEC last Friday (a workshop co-located with Financial Crypto), we study the problem using data crawled from the now-defunct Amazon PayPhrase system, introduced last year for US users only. Our goal wasn’t to evaluate the security of the scheme as deployed by Amazon, but learn more how people choose passphrases in general. While this is a relatively limited data source, our results suggest some caution on this approach.

Amazon’s system requires a multi-word (minimum 2) passphrase which is globally unique. This provided an oracle for our experiment: in the original version of the site, error messages would clearly indicate if a phrase was already chosen (as opposed to being blacklisted or invalid), letting us test large lists of phrases to see what was taken. Our first experiment was a dictionary attack using lists of movie titles, sports team names, and dozens of other types of proper nouns crawled from Wikipedia, along with idiomatic phrases crawled from soruces like Urban Dictionary. We found about 8,000 phrases using a 20,000 phrase dictionary. Using a very rough estimate for the total number of phrases and some probability calculations, this produced an estimate that passphrase distribution provides only about 20 bits of security against an attacker trying to compromise 1% of available accounts. This is far better than passwords, which are usually under 10 bits by this same metric, but not high enough to make online guessing impractical without proper rate-limiting. Curiously, it’s close to estimates made using Kuo et al.’s published numbers on mnemonic phrases. It also shows that significant numbers of people will blatantly ignore security advice about choosing nonsense phrases and choose things like “Manchester United” or “Harry Potter.”

After this experiment, we did a few experiments to test the linguistic properties of phrases by generating potential phrases according to their distribution in large linguistic corpora (we used the British National Corpus and Google n-gram corpus). Some clear trends emerged—people strongly prefer phrases which are either a single modified noun (“operation room”) or a single modified verb (“send immediately”). These phrases are perhaps easier to remember than phrases which include a verb and a noun and are therefore closer to a complete sentence. Within these categories, users don’t stray too far from choosing two-word phrases the way they’re actually produced in natural language. That is, phrases like “young man” which come up often in speech are proportionately more likely to be chosen than rare phrases like “young table.”

This led us to ask, if in the worst case users chose multi-word passphrases with a distribution identical to English speech, how secure would this be? Using the large Google n-gram corpus we can answer this question for phrases of up to 5 words. The results are discouraging: by our metrics, even 5-word phrases would be highly insecure against offline attacks, with fewer than 30 bits of work compromising over half of users. The returns appear to rapidly diminish as more words are required. This has potentially serious implications for applications like PGP private keys, which are often encrypted using a passphrase. Users are clearly more random in “passphrase English” than in actual English, but unless it’s dramatically more random the underlying natural language simply isn’t random enough. Exploring this gap is an interesting avenue for future collaboration between computer security researchers and linguists. For now we can only be comfortable that randomly-generated passphrases (using tools like Diceware) will resist offline brute force.