February 20th, 2012 at 15:54 UTC by Joseph Bonneau
Note: this research was also blogged today at the NY Times’ Bits technology blog.
I’ve personally been researching password statistics for a few years now (as well as personal knowledge questions) and our research group has a long history of research on banking security. In an upcoming paper at next weel’s Financial Cryptography conference written with Sören Preibusch and Ross Anderson, we’ve brought the two research threads together with the first-ever quantitative analysis of the difficulty of guessing 4-digit banking PINs. Somewhat amazingly given the importance of PINs and their entrenchment in infrastructure around the world, there’s never been an academic study of how people actually choose them. After modeling banking PIN selection using a combination of leaked data from non-banking sources and a massive online survey, we found that people are significantly more careful choosing PINs then online passwords, with a majority using an effectively random sequence of digits. Still, the persistence of a few weak choices and birthdates in particular suggests that guessing attacks may be worthwhile for an opportunistic thief.
Our starting point was the distribution of 4-digit sequences embedded within passwords in the RockYou dataset (1.7M of them) and a set of 200k iPhone unlock codes gathered by developer Daniel Amitay (who graciously shared his data with us). Several interesting patterns emerge in the plots of both datasets: regions of PINs representing dates, years, repeated digits, even PINs ending in 69. Building on these observations, we built a linear regression model which estimates the popularity of each individual PIN using 25 factors like whether it represents a date in DDMM format, whether it’s an ascending sequence, and so on. These broad factors explain 79% and 93% of the variance in PIN popularity in the two datasets (details are in our full paper).
People chose 4-digit numbers based on a few simple factors in these two datasets. If people chose their banking PINs this way, 8–9% of PINs would be guessable in just three tries! Surely people do better when choosing real banking PINs. In the absence of any large-scale leaks of banking data, we surveyed over 1300 users online, using Amazon’s Mechanical Turk, to measure how much different real PINs might be. The beauty of our linear model is that we didn’t have to ask anybody for their actual PIN, only if it fell into one of the general categories we identified (our survey is now viewable online).
Indeed, people are considerably more careful when choosing banking PINs. About a quarter stick with their bank-assigned random PIN and over a third choose their PIN using an old phone number, student ID, or other sequence of numbers which is, at least to a guessing attack, statistically random. In total, 63.7% use a pseudorandom PIN, much more than the 23–27% we estimated for our base datasets. Another 5% use a numeric pattern (like 4545) and 9% use a pattern on the entry keypad, also lower than the other two datasets. Altogether, this gives an attacker with 6 guesses (3 at an ATM and 3 with a CAP reader) less than a 2% chance of success. Unfortunately, the final group of 23% of users chose a PIN representing a date, and nearly a third of these used their own birthday. This is a game-changer because over 99% of customers reported that their birth date is listed somewhere in the wallet or purse where they keep their cards. If an attacker knows the cardholder’s date of birth and guesses optimally, the chances of successfully guessing jump to around 9%.
Blacklisting the top 100 PINs can drive the guessing rate down to around 0.2% in the general case (we provide a suggested blacklist in the paper). This is significant and all banks should do so which don’t already—in both the US and UK we found banks which allowed us to change to 1234. But blacklisting doesn’t work nearly as well if the birthday is known, dropping the guessing rate only to around 5%. Too many PINs can be interpreted as dates to blacklist them all, and customer-specific blacklisting using knowledge of the customer’s birthday seems impractical. In the absence of these measures, we conclude that human-chosen 4-digit PINs aren’t quite strong enough to rule out opportunistic guessing.