Passwords in the wild, part I: the gap between theory and implementation

Sören Preibusch and I have finalised our in-depth report on password practices in the wild, The password thicket: technical and market failures in human authentication on the web, presented in Boston last month for WEIS 2010. The motivation for our report was a lack of technical research into real password deployments. Passwords have been studied as an authentication mechanism quite intensively for the last 30 years, but we believe ours was the first large study into how Internet sites actually implement them. We studied 150 sites, including the most visited overall sites plus a random sample of mid-level sites. We signed up for free accounts with each site, and using a mixture of scripting and patience, captured all visible aspects of password deployment, from enrolment and login to reset and attacks.

Our data (which is now publicly available) gives us an interesting picture into the current state of password deployment. Because the dataset is huge and the paper is quite lengthy, we’ll be discussing our findings and their implications from a series of different perspectives. Today, we’ll focus on the preventable mistakes. In academic literature, it’s assumed that passwords will be encrypted during transmission, hashed before storage, and attempts to guess usernames or passwords will be throttled. None of these is widely true in practice.

Storing passwords without hashing (and salting) is the first cardinal sin of password implementation; late last year it enabled the theft of 32 M RockYou users’ passwords. Yet the practice remains common: at least 29% of the sites we studied sent cleartext passwords via email, meaning that they must be stored without proper hashing (they may be stored encrypted, but this makes little difference-if the server gets rooted, the encryption keys are likely to be available).  It’s impossible to know how the rest store passwords without auditing their databases, but another telling indicator is that at least 40% of sites imposed a short maximum password length or restrictions on special characters. These are both warning signs that passwords are being stored directly in a database table without hashing, which would reduce arbitrarily long input passwords to a constant storage size. Only 1 site of 150 successfully hashed passwords in the browser using JavaScript, giving us confidence that passwords aren’t stored in a recoverable format (two other sites tried but botched the details).

Using TLS to authenticate the server and encrypt passwords in transmission is another essential aspect of password security. 41% of sites failed to use any TLS, though what’s more surprising is the number that implemented TLS inconsistently. A full password implementation will contain up to four separate forms for password entry: enrolment, login, update, and password reset. Of sites implementing TLS, 28% forgot to protect at least one of these forms (usually the update form or the enrolment form), including some big names like Facebook, MySpace, Twitter, and WordPress. Not implementing any TLS may be a (questionable) security trade-off, but implementing it inconsistently is certainly an oversight. Only 39% of sites had a complete, working TLS deployment.

For authentication by an online server, two other important security practices are preventing user probing attacks and password guessing attacks. User probing allows attackers to build up a large list  of users registered with the site. Attackers can use such a list in a trawling attack, where a few popular passwords are guessed for a large number of accounts in the hope of compromising a small number of them. 19% of sites give an “email address not registered” error message upon log-in with a bogus email address, making user probing trivial. A much more common mistake, however, made by 80% of sites, is to give an “email address not registered” error message when requesting a password reset email for a bogus email address. This is an easy detail to overlook, but it means it is trivial to collect a large membership list at most password-collecting sites.

The risk of a trawling attack, however, is predicated on the assumption that repeated password guessing for an individual account will be limited. In fact, the large majority of sites (84%) did not appear to rate-limit password guessing at all, allowing our automated script to guess 100 incorrect passwords at the rate of one per second before successfully logging in to a test account. There is some research suggesting against low guessing cutoffs, but we saw no cutoff values greater than 20, and it seems safe to assume very few of the sites which didn’t block us after 100 attempts would do so at a later point.

In the security research community, we generally assume web-based password authentication includes basic security measures: encryption during transmission, hashing for storage, and prevention of brute force attacks or user probing. There are still many ways that passwords can fail, notably key-loggers and phishing, for which we have no consensus solution. Yet implementing the basics is surprisingly rare-just 3 of  150 studied sites managed to do so successfully (Google, Microsoft Live, and ShopBop). The sites we studied were, for the most part, professionally-done sites representing multi-million dollar businesses.

In a few cases, sites may be making a defensible security/usability tradeoff. Amazon, for example, didn’t block our brute force attempts, but there’s ample reason to believe they detect account takeover by other means. On the whole though, the level of security implemented is dramatically lower than security researchers might expect. There’s an interesting parallel here. At first the insecurity of passwords was blamed on users not behaving the way security engineers wanted them to: choosing weak passwords, forgetting them, writing them down, sharing them, and typing them in to the wrong domains. It’s now generally accepted that we should design password security around users, and that users may even be wise to ignore security advice.

Web developers are people too, as was recently argued at SOUPS, and they don’t behave the way security engineers would like either when it comes to passwords. Right or wrong, we should update our thinking to take this into account.