How hard are PINs to guess?

Note: this research was also blogged today at the NY Times’ Bits technology blog.

I’ve personally been researching password statistics for a few years now (as well as personal knowledge questions) and our research group has a long history of research on banking security. In an upcoming paper at next weel’s Financial Cryptography conference written with Sören Preibusch and Ross Anderson, we’ve brought the two research threads together with the first-ever quantitative analysis of the difficulty of guessing 4-digit banking PINs. Somewhat amazingly given the importance of PINs and their entrenchment in infrastructure around the world, there’s never been an academic study of how people actually choose them. After modeling banking PIN selection using a combination of leaked data from non-banking sources and a massive online survey, we found that people are significantly more careful choosing PINs then online passwords, with a majority using an effectively random sequence of digits. Still, the persistence of a few weak choices and birthdates in particular suggests that guessing attacks may be worthwhile for an opportunistic thief.

Our starting point was the distribution of 4-digit sequences embedded within passwords in the RockYou dataset (1.7M of them) and a set of 200k iPhone unlock codes gathered by developer Daniel Amitay (who graciously shared his data with us). Several interesting patterns emerge in the plots of both datasets: regions of PINs representing dates, years, repeated digits, even PINs ending in 69. Building on these observations, we built a linear regression model which estimates the popularity of each individual PIN using 25 factors like whether it represents a date in DDMM format, whether it’s an ascending sequence, and so on. These broad factors explain 79% and 93% of the variance in PIN popularity in the two datasets (details are in our full paper).

People chose 4-digit numbers based on a few simple factors in these two datasets. If people chose their banking PINs this way, 8–9% of PINs would be guessable in just three tries! Surely people do better when choosing real banking PINs. In the absence of any large-scale leaks of banking data, we surveyed over 1300 users online, using Amazon’s Mechanical Turk, to measure how much different real PINs might be. The beauty of our linear model is that we didn’t have to ask anybody for their actual PIN, only if it fell into one of the general categories we identified (our survey is now viewable online).

Indeed, people are considerably more careful when choosing banking PINs. About a quarter stick with their bank-assigned random PIN and over a third choose their PIN using an old phone number, student ID, or other sequence of numbers which is, at least to a guessing attack, statistically random. In total, 63.7% use a pseudorandom PIN, much more than the 23–27% we estimated for our base datasets. Another 5% use a numeric pattern (like 4545) and 9% use a pattern on the entry keypad, also lower than the other two datasets. Altogether, this gives an attacker with 6 guesses (3 at an ATM and 3 with a CAP reader) less than a 2% chance of success. Unfortunately, the final group of 23% of users chose a PIN representing a date, and nearly a third of these used their own birthday. This is a game-changer because over 99% of customers reported that their birth date is listed somewhere in the wallet or purse where they keep their cards. If an attacker knows the cardholder’s date of birth and guesses optimally, the chances of successfully guessing jump to around 9%.

Blacklisting the top 100 PINs can drive the guessing rate down to around 0.2% in the general case (we provide a suggested blacklist in the paper). This is significant and all banks should do so which don’t already—in both the US and UK we found banks which allowed us to change to 1234. But blacklisting doesn’t work nearly as well if the birthday is known, dropping the guessing rate only to around 5%. Too many PINs can be interpreted as dates to blacklist them all, and customer-specific blacklisting using knowledge of the customer’s birthday seems impractical. In the absence of these measures, we conclude that human-chosen 4-digit PINs aren’t quite strong enough to rule out opportunistic guessing.

9 thoughts on “How hard are PINs to guess?

  1. I always found it silly that random web forums allow for extremely long, complex passwords, but our money is protected by a mere 4-digit number with no expiration or re-use rules. (For that matter, most banks’ websites also have crappy password restrictions.) Fortunately, some banks allow longer PINs…

    I hope to see more advancement of technology for bank accounts though. Why am I not using my phone as an RSA key generator to access my account? How about immediate SMS notifications of every withdrawal and deposit? Some European and Asian countries have these things, but us North Americans are way behind the times.

  2. Interesting read. I would have thought that guessing might give you some hints on the banking pin. But good to have some statistical information now.
    On the other side – reading the first comment – I’m happy to be in Switzerland, where strong authentication is standard for doing e-banking.

  3. One thing pisses me off about banking websites. They should have a lower security level for viewing, compared to making payments.

    Making payments need very secure checks.

    However the banks need to open up to services like yodlee or mint, and allow secure read only, anonymous access to those services for reading transactions.

  4. There is nothing new in this. The issue is customer selected PINs. They should be banned. Computer generated PINs, whether random or IBM3624, do not exhibit this behaviour.

    The issue had been reported by Sailclose Publications (DWB) in 2003.

    In view of this, I suggest that the idea of customer selected PINs is a disaster area, all banks should abandon it and PCI should recommend against it. Banning customer selected PINs is more important than several other PCI PIN related requirements.

  5. Unfortunately I’m now at an age where dislike of PINs is interpreted as old grandad not being quite up to speed with the modern world; but I find particularly galling when told that “I can change my PIN so it’s the same as the other ones”. This at the Co-op and the Post Office: there’s no point even trying to tell these people that this is lousy advice.

  6. If I understood correctly, some 8% chose their own date, the rest chose something else. why is that so bad? Do we know that this exposure actually leads to fraud? Sure, we can come up with 9 digits passwords, but is it worth it?
    If 1234 is no common than 8520 or 4583, is it a bad password? In a world of “strong” passwords there are always easier passwords. These would be: Zxasqw12, Q1w2e3, !qaz@wsx and so on (for a minimum of 8 digits and complexity). If we banned all the easy passwords people would still pick the easier of the hard ones, so it’s back to square one. We need the ability to choose easy passwords to have a larger key space and we need to help people choose safely.

  7. PIN with the PAN is a recipe for fraud, it is exactly why Chip & Pin was introduced throughout the world, except for a few places, mainly the USA. Suffering from “not invented here” mentality has meant Card Present fraud remains high in the States (although now being forced by Visa) whereas Europe it has all but disappeared. So we are left with CNP (card not present) transactions, well Visa came up with VbV or 3DS that verifies your transactions via a link to your issuer, Mastercard licence this from Visa. It’s not fool proof but certainly better than nothing and again cuts down fraud, but not 100% and difficult to prove if your details have been harvested from a bogus site, liability goes to customers not the issuer. You have to understand that banks, Visa, Mastercard, Amex etc make a fortune on fraud, fines and all the industry around it. If they really wanted to sort things out then they would release the patents on the smart cards than use revolving passwords similar to an RSA key, without going into detail this is an 8 digit number that is generated every 1 minute according to the internal algorithm which is ok if it’s broken because you have to know the first (seed) value when the card and internal database (with your issuer) are synchronised. Virtually foolproof. Why don’t they do it? Cost is about $8 for each card replaced on a yearly basis so not much. But they are not going to be that stupid! Visa, Mastercard and Amex made over $123 million dollars in fines for the Heartland breach, $103 million for the TJX breach. Liability then shifts from the customer to the bank that issues the cards … So you see?

  8. Perhaps I should leave a note of my birthday in my wallet … it has nothing to do with any of my PINs and invites any attacker to waste one of their three guesses!

  9. More press coverage here – where banks once advised customers to use the same PIN for multiple accounts, they now have small print denying refunds to those who followed the earlier advice.

Leave a Reply to nick james Cancel reply

Your email address will not be published. Required fields are marked *