Passwords in the wild, part I: the gap between theory and implementation

Sören Preibusch and I have finalised our in-depth report on password practices in the wild, The password thicket: technical and market failures in human authentication on the web, presented in Boston last month for WEIS 2010. The motivation for our report was a lack of technical research into real password deployments. Passwords have been studied as an authentication mechanism quite intensively for the last 30 years, but we believe ours was the first large study into how Internet sites actually implement them. We studied 150 sites, including the most visited overall sites plus a random sample of mid-level sites. We signed up for free accounts with each site, and using a mixture of scripting and patience, captured all visible aspects of password deployment, from enrolment and login to reset and attacks.

Our data (which is now publicly available) gives us an interesting picture into the current state of password deployment. Because the dataset is huge and the paper is quite lengthy, we’ll be discussing our findings and their implications from a series of different perspectives. Today, we’ll focus on the preventable mistakes. In academic literature, it’s assumed that passwords will be encrypted during transmission, hashed before storage, and attempts to guess usernames or passwords will be throttled. None of these is widely true in practice.

Storing passwords without hashing (and salting) is the first cardinal sin of password implementation; late last year it enabled the theft of 32 M RockYou users’ passwords. Yet the practice remains common: at least 29% of the sites we studied sent cleartext passwords via email, meaning that they must be stored without proper hashing (they may be stored encrypted, but this makes little difference-if the server gets rooted, the encryption keys are likely to be available).  It’s impossible to know how the rest store passwords without auditing their databases, but another telling indicator is that at least 40% of sites imposed a short maximum password length or restrictions on special characters. These are both warning signs that passwords are being stored directly in a database table without hashing, which would reduce arbitrarily long input passwords to a constant storage size. Only 1 site of 150 successfully hashed passwords in the browser using JavaScript, giving us confidence that passwords aren’t stored in a recoverable format (two other sites tried but botched the details).

Using TLS to authenticate the server and encrypt passwords in transmission is another essential aspect of password security. 41% of sites failed to use any TLS, though what’s more surprising is the number that implemented TLS inconsistently. A full password implementation will contain up to four separate forms for password entry: enrolment, login, update, and password reset. Of sites implementing TLS, 28% forgot to protect at least one of these forms (usually the update form or the enrolment form), including some big names like Facebook, MySpace, Twitter, and WordPress. Not implementing any TLS may be a (questionable) security trade-off, but implementing it inconsistently is certainly an oversight. Only 39% of sites had a complete, working TLS deployment.

For authentication by an online server, two other important security practices are preventing user probing attacks and password guessing attacks. User probing allows attackers to build up a large list  of users registered with the site. Attackers can use such a list in a trawling attack, where a few popular passwords are guessed for a large number of accounts in the hope of compromising a small number of them. 19% of sites give an “email address not registered” error message upon log-in with a bogus email address, making user probing trivial. A much more common mistake, however, made by 80% of sites, is to give an “email address not registered” error message when requesting a password reset email for a bogus email address. This is an easy detail to overlook, but it means it is trivial to collect a large membership list at most password-collecting sites.

The risk of a trawling attack, however, is predicated on the assumption that repeated password guessing for an individual account will be limited. In fact, the large majority of sites (84%) did not appear to rate-limit password guessing at all, allowing our automated script to guess 100 incorrect passwords at the rate of one per second before successfully logging in to a test account. There is some research suggesting against low guessing cutoffs, but we saw no cutoff values greater than 20, and it seems safe to assume very few of the sites which didn’t block us after 100 attempts would do so at a later point.

In the security research community, we generally assume web-based password authentication includes basic security measures: encryption during transmission, hashing for storage, and prevention of brute force attacks or user probing. There are still many ways that passwords can fail, notably key-loggers and phishing, for which we have no consensus solution. Yet implementing the basics is surprisingly rare-just 3 of  150 studied sites managed to do so successfully (Google, Microsoft Live, and ShopBop). The sites we studied were, for the most part, professionally-done sites representing multi-million dollar businesses.

In a few cases, sites may be making a defensible security/usability tradeoff. Amazon, for example, didn’t block our brute force attempts, but there’s ample reason to believe they detect account takeover by other means. On the whole though, the level of security implemented is dramatically lower than security researchers might expect. There’s an interesting parallel here. At first the insecurity of passwords was blamed on users not behaving the way security engineers wanted them to: choosing weak passwords, forgetting them, writing them down, sharing them, and typing them in to the wrong domains. It’s now generally accepted that we should design password security around users, and that users may even be wise to ignore security advice.

Web developers are people too, as was recently argued at SOUPS, and they don’t behave the way security engineers would like either when it comes to passwords. Right or wrong, we should update our thinking to take this into account.

10 thoughts on “Passwords in the wild, part I: the gap between theory and implementation

  1. “at least 29% of the sites we studied sent cleartext passwords via email, meaning that they must be stored without proper hashing”

    This is a non-sequitur, at least if you are referring to the sort of email that gives you a new password when you’ve forgotten yours. It’s perfectly possible to generate the new password, store its hash, and email the cleartext version. Whether this actually happens is of course another question.

  2. To clarify Fred’s question, the 29% mentioned all sent cleartext passwords which we had previously registered, not new randomly-generated passwords, clearly indicating that they had been stored unhashed.

  3. Actually, I still question the 29% sending cleartext passwords meaning those passwords are not stored hashed – is this on requesting a lost password?

    Because generating a hash, storing the hash, and sending off an email with the plaintext password in it (which is never stored beyond the POST in the registration form) is also a common thing for registration, so the new user has a copy.

  4. @Elizabeth – even if the plaintext password is not stored in the database, but is emailed in plaintext from the site to the customer, it has been stored in plaintext in many open places – not just the web user’s “sent” folder and the user’s “inbox” folder, but it has also been transmitted in plain text across ISPs and national borders.

    When I have just created an account on a website, I do not need the website to remind me what password I just provided. I may have forgotten by tomorrow exactly what algorithm I used to create that password (did I tell Amazon “ama#$sgpzon!” or “zon!sgp#$ama”?) , but at the time of creation, that it when a reminder is of the least value.

    I run a tiny website, which authenticates blog posts and a wishlist, but it matches all of these criteria other than TLS.

  5. As a MediaWiki developer, I’d like to comment on a few of your conclusions on Wikipedia.

    First: the password requirements for ordinary users are deliberately very lax, because ordinary users have no special privileges. In most web software, even unprivileged users can take destructive or privileged actions, such as deleting their own posts or messages, or viewing secret information like private messages sent to them, their friend’s wall, etc.

    In MediaWiki, basically the only interesting thing you’d be able to see or do if you hacked an ordinary account is get the user’s e-mail address. Users cannot send or receive private messages of any type, and pages cannot generally be configured to be editable only by some users and not others. Nothing can be permanently deleted or irreversibly changed in any way. The worst an attacker could do to you is get you temporarily blocked for vandalism, until you contact the system administrators to demonstrate your identity, so that your password can be reset.

    Given all this, it wouldn’t be reasonable to impose strong password requirements on users. The usual measures for improving password security impose extra burdens on users, and the modus operandi of Wikipedia is to keep barriers to entry as low as possible, so anyone can contribute.

    Note that at least at some point, privileged users (sysops) did have restrictions placed on their passwords, including length, checking against dictionaries, etc. I don’t know if those are still in effect for any or all wikis. However, even privileged users can’t take irreversible actions in MediaWiki, and almost no privileged users can even view much private information, so again, it’s not as big a security problem as it would be on other types of sites.

    I looked over the rest of your data for Wikipedia, and noticed that you have “email required” true. Wikipedia does not require an e-mail address to be provided for signup. If you don’t provide one, or it doesn’t get verified, all e-mail-based features will be disabled, but you can otherwise use the site normally.

    You also said that Wikipedia provides no TLS support. I don’t know if you meant none required or none available at all, but you can sign in and browse the site via secure.wikimedia.org if you know about it. This sends images insecurely, but cookies and passwords are still encrypted. (Forcing all login to be over TLS would be nice, but it’s not a high priority for the ops team.)

    You have the maximum password length listed as “?”. The maximum password length is dictated by the maximum POST size, so on the order of megabytes. I’ve actually tested a multi-megabyte password on MediaWiki, and it worked fine, since of course the passwords are only stored as salted hashes. I guess “?” is standing in here for “really high”.

    Overall, though, the study was interesting. In addition to correcting a few minor inaccuracies, I just wanted to point out that Wikipedia deliberately favors usability over security when it comes to account takeovers, because a compromised account usually hurts no one but the legitimate account-holder, and that not even much (since little information is disclosed and all illicit actions are easily reversible). It would be good if we required HTTPS for login and allowed OpenID, though.

  6. @Aryeh: Thank you very much for sharing the security/usability considerations behind Wikipedia’s password practices. (In the data, the question mark indicates that no upper limit on password length was reported by the site itself.) I had been commenting on Wikipedia earlier this week, in the Wikipedia Signpost, concluding that Wikipedia’s threat model is indeed quite special.

  7. @Elizabeth – I thought the same thing ; the whole sentence “…sent cleartext passwords via email, meaning that they must be stored without proper hashing (they may be stored encrypted, but this makes little difference-if the server gets rooted, the encryption keys are likely to be available).” doesn’t make sense to me.

    Did you find the answer ? is the explanation in the in-depth report ?

  8. One of the most annoying things – “Congratulations for registering with XYZ.com – your new username is ‘FredBlogs999’ and your new password is ‘W1BBL3′” While it does not necessarily indicate that the password is stored in plain text, it does render the nice starred out password entry fields on the web page pointless.

    Another problem is sites that have a backup mechanism. If you forget your really strong password you can have it reset by entering the name of your pet dog. The reset question becomes a far weaker secret than the password they protect. You are also encouraged to enter clues to it. I should really have a second strong password for these. “My dog is called ‘sd876Dfs!$d’, honest!”. I think more sites now will require you to have access to your registered email account to use this feature which gives some protection for it.

    Something that dig ring alarm bells in the article: “Only 1 site of 150 successfully hashed passwords in the browser using JavaScript, giving us confidence that passwords aren’t stored in a recoverable format (two other sites tried but botched the details).”

    Did this site check the hash sent by the javascript with the value on the database? Do they use a nonce or something to ensure that it is not vulnerable to replay attack? Otherwise the hashed value generated by the javascript becomes the new password and if there is no nonce and no further processing then there is no effective salting on the database. The hashing just becomes a way of turning “PA55W0RD” into something that looks a bit stronger.

  9. Hello Joseph,
    sorry to “necropost”, but I just found this article after reading your Gawker articles, and would like to make two comments.

    1. That Brostoff / Sasse article about penalty lockouts … is it for real!?! I was almost able to fisk it — scarcely a paragraph without something wrong.

    2. Phrases like “multi-million dollar businesses” are one of my bugbears. It is quite a long time since a multi-million dollar business meant a large business. If you mean it cost several million to buy or set up, then you are talking something like a corner store with 4 or 5 employees. If you mean several million per annum revenue, then that’s a slightly bigger business, but still probably no more than a couple of dozen employees.

    I have worked at a start-up like that. A “multi-million dollar business” (revenue, that is, not profit!) and the techie side of the house included 4 coders, ~4 DBAs, and half a dozen sysadmin / helpdesk / gophers. I was one of the latter. None of the coders was what I would call a “software engineer”, but they certainly weren’t terrible.

    However, they knew sweet FA about security engineering. As an example, I was the only person there who understood the difference between ECB and CBC cipher modes. And got a reputation as a gun coder when I sped up a certain module more than an order of magnitude by realising that they were re-doing the Blowfish key schedule for every block.

    Fact is, few sites less than a multi-BILLION dollar business have *any* security engineers. Unless they get a complete login / authentication module from a competent project, they will try to roll their own and unwittingly repeat all the mistakes of the past. Many systems like Rails and PHP make it very easy to role your own, whereas complete, bulletproof (??) systems are usually only available in expensive “Enterprise Platforms”, and even then they have issues.

    Consider JGuard, for example. It is a complex authentication and authorisation management package. But what does it say about hashing and salting? Buried deep in the docs there is one paragraph about hashing, how to query your system about what is already available, and add it it. No hint about why it is so important. Salting gets one paragraph also, which recommends you read about it in Wikipedia. You have to enable salting in a config file, and you get just ONE fixed system-wide salt.

    I’m not picking on JGuard here; it’s a nice project, and the rest are no better. If you are not expert enough to roll your own, this is as good as it gets. (Spring / Acegi, for example, is slightly worse in this respect. But even more complicated and Enterprisey!)

Leave a Reply

Your email address will not be published. Required fields are marked *