On the (alleged) LinkedIn password leak

UPDATE 2012-06-07: LinkedIn has confirmed the leak is real, that they “recently” switched to salted passwords (so the data is presumably an out-of-date backup) and that they’re resetting passwords of users involved in the leak. There is still no credible information about if the hackers involved have the account names or the rest of the site’s passwords. If so, this incident could still have serious security consequences for LinkedIn users. If not, it’s still a major black eye for LinkedIn, though they deserve credit for acting quickly to minimise the damage.

LinkedIn appears to have been the latest website to suffer a large-scale password leak. Perhaps due to LinkedIn’s relatively high profile, it’s made major news very quickly even though LinkedIn has neither confirmed nor denied the reports. Unfortunately the news coverage has badly muddled the facts. All I’ve seen is a list 6,458,020 unsalted SHA-1 hashes floating around. There are no account names associated with the hashes. Most importantly the leaked file has no repeated hashes. All of the coverage appears to miss this fact. Most likely, the leaker intentionally ran it through ‘uniq’ in addition to removing account info to limit the damage. Also interestingly, 3,521,180 (about 55%) of the hashes have the first 20 bits over-written with 0. Among these, 670,785 are otherwise equal to another hash, meaning that they are actually repeats of the same password stored in a slightly different format (LinkedIn probably just switched formats at some point in the past). So there are really 5,787,235 unique hashes leaked.

This gives us no idea how many total accounts were affected by the leak. It’s probably much less than the total number at the site (about 161 M) unless LinkedIn users choose implausibly terrible passwords. The RockYou leak, which included passwords for 32M users, had over 14 M unique passwords. A random sample of about 12.5 M RockYou passwords has an expected 5.8 M unique passwords, so we might project that the LinkedIn leak represents closer to 12.5 M users if the password distributions are similar. Any news reporting indicating “6.5 M accounts are affected” has not done basic investigation on the source data here.

Here’s the more important thing though: in its current form this leak has minimal security implications for LinkedIn users. No account identifiers were leaked, so this doesn’t allow offline attack against individual accounts. The passwords were hashed, which means it doesn’t reveal any user’s password which is so strong as to be out of the range of current cracking libraries. The fact that the passwords are uniqued means little information about the most popular passwords at LinkedIn is revealed. In short, while this data might be interesting for research, there’s nothing particularly useful here for attackers with one exception: given a list of LinkedIn usernames, an attacker might try many variations of them and see if they’re in the list and then try those that are with the known usernames at the real login page. This is a minor risk: relatively few users use variations of their username, and those that do can already be attacked online with slightly less efficiency.

This situation could turn out to be much worse. The attacker could be sitting on the whole database and leaked this subset accidentally or to try to find a buyer. For now though, the amount of news coverage is way out of proportion with the real impact. LinkedIn should be criticised heavily if they truly had their password database breached, but the news coverage so far has caused premature panic among LinkedIn users.

A further note on cracking so far: I’ve also seen a list of 163,237 hashes which have been inverted, leading to reports along the lines of “x passwords have already been compromised.” The list I’ve seen doesn’t seem to be a skilled job-missing basic things like “password” which are in the leak and only cracking hashes which didn’t have the high-order bits zeroed out. Surely a better cracking effort will be done, but this doesn’t matter much anyways given the lack of account information in the leak.