Passwords in the wild, part I: the gap between theory and implementation
July 27th, 2010 at 15:16 UTC by Joseph Bonneau
Sören Preibusch and I have finalised our in-depth report on password practices in the wild, The password thicket: technical and market failures in human authentication on the web, presented in Boston last month for WEIS 2010. The motivation for our report was a lack of technical research into real password deployments. Passwords have been studied as an authentication mechanism quite intensively for the last 30 years, but we believe ours was the first large study into how Internet sites actually implement them. We studied 150 sites, including the most visited overall sites plus a random sample of mid-level sites. We signed up for free accounts with each site, and using a mixture of scripting and patience, captured all visible aspects of password deployment, from enrolment and login to reset and attacks.
Our data (which is now publicly available) gives us an interesting picture into the current state of password deployment. Because the dataset is huge and the paper is quite lengthy, we’ll be discussing our findings and their implications from a series of different perspectives. Today, we’ll focus on the preventable mistakes. In academic literature, it’s assumed that passwords will be encrypted during transmission, hashed before storage, and attempts to guess usernames or passwords will be throttled. None of these is widely true in practice.
Storing passwords without hashing (and salting) is the first cardinal sin of password implementation; late last year it enabled the theft of 32 M RockYou users’ passwords. Yet the practice remains common: at least 29% of the sites we studied sent cleartext passwords via email, meaning that they must be stored without proper hashing (they may be stored encrypted, but this makes little difference-if the server gets rooted, the encryption keys are likely to be available). It’s impossible to know how the rest store passwords without auditing their databases, but another telling indicator is that at least 40% of sites imposed a short maximum password length or restrictions on special characters. These are both warning signs that passwords are being stored directly in a database table without hashing, which would reduce arbitrarily long input passwords to a constant storage size. Only 1 site of 150 successfully hashed passwords in the browser using JavaScript, giving us confidence that passwords aren’t stored in a recoverable format (two other sites tried but botched the details).
Using TLS to authenticate the server and encrypt passwords in transmission is another essential aspect of password security. 41% of sites failed to use any TLS, though what’s more surprising is the number that implemented TLS inconsistently. A full password implementation will contain up to four separate forms for password entry: enrolment, login, update, and password reset. Of sites implementing TLS, 28% forgot to protect at least one of these forms (usually the update form or the enrolment form), including some big names like Facebook, MySpace, Twitter, and WordPress. Not implementing any TLS may be a (questionable) security trade-off, but implementing it inconsistently is certainly an oversight. Only 39% of sites had a complete, working TLS deployment.
For authentication by an online server, two other important security practices are preventing user probing attacks and password guessing attacks. User probing allows attackers to build up a large list of users registered with the site. Attackers can use such a list in a trawling attack, where a few popular passwords are guessed for a large number of accounts in the hope of compromising a small number of them. 19% of sites give an “email address not registered” error message upon log-in with a bogus email address, making user probing trivial. A much more common mistake, however, made by 80% of sites, is to give an “email address not registered” error message when requesting a password reset email for a bogus email address. This is an easy detail to overlook, but it means it is trivial to collect a large membership list at most password-collecting sites.
The risk of a trawling attack, however, is predicated on the assumption that repeated password guessing for an individual account will be limited. In fact, the large majority of sites (84%) did not appear to rate-limit password guessing at all, allowing our automated script to guess 100 incorrect passwords at the rate of one per second before successfully logging in to a test account. There is some research suggesting against low guessing cutoffs, but we saw no cutoff values greater than 20, and it seems safe to assume very few of the sites which didn’t block us after 100 attempts would do so at a later point.
In the security research community, we generally assume web-based password authentication includes basic security measures: encryption during transmission, hashing for storage, and prevention of brute force attacks or user probing. There are still many ways that passwords can fail, notably key-loggers and phishing, for which we have no consensus solution. Yet implementing the basics is surprisingly rare-just 3 of 150 studied sites managed to do so successfully (Google, Microsoft Live, and ShopBop). The sites we studied were, for the most part, professionally-done sites representing multi-million dollar businesses.
In a few cases, sites may be making a defensible security/usability tradeoff. Amazon, for example, didn’t block our brute force attempts, but there’s ample reason to believe they detect account takeover by other means. On the whole though, the level of security implemented is dramatically lower than security researchers might expect. There’s an interesting parallel here. At first the insecurity of passwords was blamed on users not behaving the way security engineers wanted them to: choosing weak passwords, forgetting them, writing them down, sharing them, and typing them in to the wrong domains. It’s now generally accepted that we should design password security around users, and that users may even be wise to ignore security advice.
Web developers are people too, as was recently argued at SOUPS, and they don’t behave the way security engineers would like either when it comes to passwords. Right or wrong, we should update our thinking to take this into account.
Entry filed under: Academic papers, Protocols, Security economics, Security engineering
7 comments Add your own
1. Fred | July 27th, 2010 at 18:48 UTC
“at least 29% of the sites we studied sent cleartext passwords via email, meaning that they must be stored without proper hashing”
This is a non-sequitur, at least if you are referring to the sort of email that gives you a new password when you’ve forgotten yours. It’s perfectly possible to generate the new password, store its hash, and email the cleartext version. Whether this actually happens is of course another question.
2. Joseph Bonneau | July 27th, 2010 at 18:53 UTC
To clarify Fred’s question, the 29% mentioned all sent cleartext passwords which we had previously registered, not new randomly-generated passwords, clearly indicating that they had been stored unhashed.
3. Marc Ruef | July 27th, 2010 at 20:54 UTC
Hello,
Very nice work, indeed!
I have published two posts recently which might be interesting for you (although it is more about password structure again):
* http://www.scip.ch/?labs.20091120
* http://www.scip.ch/?labs.20100709
Regards,
Marc
4. Elizabeth M Smith | July 27th, 2010 at 21:43 UTC
Actually, I still question the 29% sending cleartext passwords meaning those passwords are not stored hashed – is this on requesting a lost password?
Because generating a hash, storing the hash, and sending off an email with the plaintext password in it (which is never stored beyond the POST in the registration form) is also a common thing for registration, so the new user has a copy.
5. Steve Parker | July 28th, 2010 at 00:37 UTC
@Elizabeth – even if the plaintext password is not stored in the database, but is emailed in plaintext from the site to the customer, it has been stored in plaintext in many open places – not just the web user’s “sent” folder and the user’s “inbox” folder, but it has also been transmitted in plain text across ISPs and national borders.
When I have just created an account on a website, I do not need the website to remind me what password I just provided. I may have forgotten by tomorrow exactly what algorithm I used to create that password (did I tell Amazon “ama#$sgpzon!” or “zon!sgp#$ama”?) , but at the time of creation, that it when a reminder is of the least value.
I run a tiny website, which authenticates blog posts and a wishlist, but it matches all of these criteria other than TLS.
6. Aryeh Gregor | August 3rd, 2010 at 16:12 UTC
As a MediaWiki developer, I’d like to comment on a few of your conclusions on Wikipedia.
First: the password requirements for ordinary users are deliberately very lax, because ordinary users have no special privileges. In most web software, even unprivileged users can take destructive or privileged actions, such as deleting their own posts or messages, or viewing secret information like private messages sent to them, their friend’s wall, etc.
In MediaWiki, basically the only interesting thing you’d be able to see or do if you hacked an ordinary account is get the user’s e-mail address. Users cannot send or receive private messages of any type, and pages cannot generally be configured to be editable only by some users and not others. Nothing can be permanently deleted or irreversibly changed in any way. The worst an attacker could do to you is get you temporarily blocked for vandalism, until you contact the system administrators to demonstrate your identity, so that your password can be reset.
Given all this, it wouldn’t be reasonable to impose strong password requirements on users. The usual measures for improving password security impose extra burdens on users, and the modus operandi of Wikipedia is to keep barriers to entry as low as possible, so anyone can contribute.
Note that at least at some point, privileged users (sysops) did have restrictions placed on their passwords, including length, checking against dictionaries, etc. I don’t know if those are still in effect for any or all wikis. However, even privileged users can’t take irreversible actions in MediaWiki, and almost no privileged users can even view much private information, so again, it’s not as big a security problem as it would be on other types of sites.
I looked over the rest of your data for Wikipedia, and noticed that you have “email required” true. Wikipedia does not require an e-mail address to be provided for signup. If you don’t provide one, or it doesn’t get verified, all e-mail-based features will be disabled, but you can otherwise use the site normally.
You also said that Wikipedia provides no TLS support. I don’t know if you meant none required or none available at all, but you can sign in and browse the site via secure.wikimedia.org if you know about it. This sends images insecurely, but cookies and passwords are still encrypted. (Forcing all login to be over TLS would be nice, but it’s not a high priority for the ops team.)
You have the maximum password length listed as “?”. The maximum password length is dictated by the maximum POST size, so on the order of megabytes. I’ve actually tested a multi-megabyte password on MediaWiki, and it worked fine, since of course the passwords are only stored as salted hashes. I guess “?” is standing in here for “really high”.
Overall, though, the study was interesting. In addition to correcting a few minor inaccuracies, I just wanted to point out that Wikipedia deliberately favors usability over security when it comes to account takeovers, because a compromised account usually hurts no one but the legitimate account-holder, and that not even much (since little information is disclosed and all illicit actions are easily reversible). It would be good if we required HTTPS for login and allowed OpenID, though.
7. Sören Preibusch | August 5th, 2010 at 12:21 UTC
@Aryeh: Thank you very much for sharing the security/usability considerations behind Wikipedia’s password practices. (In the data, the question mark indicates that no upper limit on password length was reported by the site itself.) I had been commenting on Wikipedia earlier this week, in the Wikipedia Signpost, concluding that Wikipedia’s threat model is indeed quite special.
Leave a Comment
Some HTML allowed:<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Subscribe to the comments via RSS Feed