January 7th, 2011 at 20:08 UTC by Joseph Bonneau
A few weeks ago I detailed how Gawker lost a million of their users’ passwords. Soon after this I found an interesting vulnerability in Gawker’s password deployment involving the handling of non-ASCII characters. Specifically, they didn’t handle them at all until two weeks ago, instead they were mapping all non-ASCII characters to the ASCII ‘?’ prior to hashing them. This not only greatly limited the theoretical space of passwords, but meant that passwords consisting of any n non-ASCII characters were equivalent to ‘?’^n. Native Telugu or Korean speakers with passwords like ‘రహస్య సంకేత పదం’ or ‘비밀번호’ were vulnerable to an attacker simply guessing a string of question marks. An attacker may in fact know in advance that some users are from non-Latin countries (for example by looking at their email addresses) potentially making this more easily exploitable.
I came across this issue because I was curious how crypt() would handle non-ASCII characters. Because DES uses 56-bit keys, crypt() must take its 8 characters of input (all passwords longer than this are truncated) and coerce them into 7 bits each. For traditional ASCII characters, which range only from 0-127, this is achieved by simply dropping the high-order bit from each byte in the string. The same thing is done for non-ASCII characters, although because they may be represented by more than one-byte in encodings like utf-8, crypt() will use even fewer than 8 characters. The effect will depend on the encoding size of each individual character, but only the first four characters of ‘Пароль’ will be used by crypt(), and only the first three characters of ‘ パスワード‘.
I was quite surprised to find that Gawker managed to drop all non-ASCII characters. In fact, the problem had nothing to do with crypt(). In an interesting twist that nobody seemed to realise after the Gawker hack, they’d actually already switched to the much-stronger, Blowfish-based bcrypt() months ago. All passwords updated since the switch have been hashed with bcrypt(), but Gnosis ignored this column of the SQL database (which also explains the large discrepancy between the database size and the number of crackable passwords). Unfortunately, bcrypt() is not quite as widely supported, and Gawker was using a relatively little-known Java library with the known bug of converting all non-ASCII characters to ‘?’ prior to hashing.
Gawker was very professional in its response. I received thanks and a clear timetable for a fix within 24 hours, the fix itself took less than 72 hours. They also ran an item in Gizmodo explaining the problem which was both courteous and very honest about the problem. However, Gawker did not force potentially vulnerable users to change their passwords-just as they didn’t do so with the passwords leaked in December. It seems far-fetched that most users will find the Gizmodo article, understand it, and take action; they should have been forced to change.
This is a very minor problem for the userbase of Gawker’s blogs (which are only available in English). I determined through brute-force that about 1 in 50,000 Gawker users use a password which is entirely non-Latin (about the same number as use ‘aolsucks’ as their password), only a few dozen registered users in total. It remains to be seen what other vulnerabilities exist in the wild involving password hashing and character encoding (particularly issues surrounding form encoding), which has often been poorly understood by programmers and a major source of bugs. Traditionally these issues probably didn’t matter much. When non-ASCII characters were poorly supported by most software there were few users who couldn’t produce an ASCII password. Increasingly however, OS and browsers are internationalised well enough that more users may use non-ASCII characters in their passwords, particularly at sites which have larger numbers of users outside the West, meaning utf-8 should be cleanly handled in any password implementation.
Thanks to Rubin Xu, Dongting Yu, Andrew Lewis, and Richard Clayton for help researching this vulnerability.