All posts by Joseph Bonneau

The Gawker hack: how a million passwords were lost

Almost a year to the date after the landmark RockYou password hack, we have seen another large password breach, this time of Gawker Media. While an order of magnitude smaller, it’s still probably the second largest public compromise of a website’s password file, and in many ways it’s a more interesting case than RockYou. The story quickly made it to the mainstream press, but the reported details are vague and often wrong. I’ve obtained a copy of the data (which remains generally available, though Gawker is attempting to block listing of the torrent files) so I’ll try to clarify the details of the leak and Gawker’s password implementation (gleaned mostly from the readme file provided with the leaked data and from reverse engineering MySQL dumps). I’ll discuss the actual password dataset in a future post. Continue reading The Gawker hack: how a million passwords were lost

Passwords in the wild, part IV: the future

This is the fourth and final part in a series on password implementations at real websites, based on my paper at WEIS 2010 with Sören Preibusch.

Given the problems associated with passwords on the web outlined in the past few days, for years academics have searched for new technology to replace passwords. This thinking can at times be counter-productive, as no silver bullets have yet materialised and this has distracted attention away from fixing the most pressing problems associated with passwords. Currently, the trendiest proposed solution is to use federated identity protocols to greatly reduce the number of websites which must collect passwords (as we’ve argued would be a very positive step). Much focus has been given to OpenID, yet it is still struggling to gain widespread adoption. OpenID was deployed at less than 3% of websites we observed, with only Mixx and LiveJournal giving it much prominence.

Nevertheless, we optimistically feel that real changes will happen in the next few years, as password authentication on the web seems to be becoming increasingly unsustainable due to the increasing scale and interconnectivity of websites collecting passwords. We actually think we are already in the early stages of a password revolution, just not of the type predicted by academia.

Continue reading Passwords in the wild, part IV: the future

Passwords in the wild, part II: failures in the market

This is the second part in a series on password implementations at real websites, based on my paper at WEIS 2010 with Sören Preibusch.

As we discussed yesterday, dubious practices abound within real sites’ password implementations. Password insecurity isn’t only due to random implementation mistakes, though. When we scored sites’ passwords implementations on a 10-point aggregate scale it became clear that a wide spectrum of implementation quality exists. Many web authentication giants (Amazon, eBay, Facebook, Google, LiveJournal, Microsoft, MySpace, Yahoo!) scored near the top, joined by a few unlikely standouts (IKEA, CNBC). At the opposite end were a slew of lesser-known merchants and news websites. Exploring the factors which lead to better security confirms the basic tenets of security economics: sites with more at stake tend to do better. However, doing better isn’t enough. Given users’ well-documented tendency to re-use passwords, the varying levels of security may represent a serious market failure which is undermining the security of password-based authentication.

Continue reading Passwords in the wild, part II: failures in the market

Passwords in the wild, part I: the gap between theory and implementation

Sören Preibusch and I have finalised our in-depth report on password practices in the wild, The password thicket: technical and market failures in human authentication on the web, presented in Boston last month for WEIS 2010. The motivation for our report was a lack of technical research into real password deployments. Passwords have been studied as an authentication mechanism quite intensively for the last 30 years, but we believe ours was the first large study into how Internet sites actually implement them. We studied 150 sites, including the most visited overall sites plus a random sample of mid-level sites. We signed up for free accounts with each site, and using a mixture of scripting and patience, captured all visible aspects of password deployment, from enrolment and login to reset and attacks.

Our data (which is now publicly available) gives us an interesting picture into the current state of password deployment. Because the dataset is huge and the paper is quite lengthy, we’ll be discussing our findings and their implications from a series of different perspectives. Today, we’ll focus on the preventable mistakes. In academic literature, it’s assumed that passwords will be encrypted during transmission, hashed before storage, and attempts to guess usernames or passwords will be throttled. None of these is widely true in practice.

Continue reading Passwords in the wild, part I: the gap between theory and implementation

Evaluating statistical attacks on personal knowledge questions

What is your mother’s maiden name? How about your pet’s name? Questions like these were a dark corner of security systems for quite some time. Most security researchers instinctively think they aren’t very secure. But they still have gained widespread deployment as a backup to password-based authentication when email-based identification isn’t available. Free webmail providers, for example, may have no other choice. Unfortunately, because most websites rely on email when passwords fail, and email providers rely on personal knowledge questions, most web authentication is no more secure than personal knowledge questions. This risk has gotten more attention recently, with high profile compromises of Paris Hilton’s phone, Sarah Palin’s email, and Twitter’s corporate Google Documents occurring due to guessed personal knowledge questions.

There’s finally been a surge of academic research into the area in the last five years. It’s been shown, for example, that these questions are easy to look up online, often found in public records, and easy for friends and acquaintances to guess. In a joint work with Mike Just and Greg Matthews from the University of Edinburgh published this week in the proceedings of Financial Cryptography 2010, we’ve examined the more basic question of how secure the underlying answer distributions are to statistical guessing. Put another way, if an attacker wants to do no target-specific work, but just guess common answers for a large number of accounts using population-wide statistics, how well can she do?

Continue reading Evaluating statistical attacks on personal knowledge questions

What's the Buzz about? Studying user reactions

Google Buzz has been rolled out to 150M Gmail users around the world. In their own words, it’s a service to start conversations and share things with friends. Cynics have said it’s a megalomaniacal attempt to leverage the existing user base to compete with Facebook/Twitter as a social hub. Privacy advocates have rallied sharply around a particular flaw: the path of least-resistance to signing up for Buzz includes automatically following people based on Buzz’s recommendations from email and chat frequency, and this “follower” list is completely public unless you find the well-hidden privacy setting. As a business decision, this makes sense, the only chance for Buzz to make it is if users can get started very quickly. But this is a privacy misstep that a mandatory internal review would have certainly objected to. Email is still a private, personal medium. People email their mistresses, workers email about job opportunities, reporters email anonymous sources all with the same emails they use for everything else. Besides the few embarrassing incidents this will surely cause, it’s fundamentally playing with people’s perceptions of public and private online spaces and actively changing social norms, as my colleague Arvind Narayanan spelled out nicely.

Perhaps more interesting than the pundit’s responses though is the ability to view thousands of user’s reactions to Buzz as they happen. Google’s design philosophy of “give minimal instructions and just let users type things into text boxes and see what happens” preserved a virtual Pompeii of confused users trying to figure out what the new thing was and accidentally broadcasting their thoughts to the entire Internet. If you search Buzz for words like “stupid,” “sucks,” and “hate” the majority of the conversation so far is about Buzz itself. Thoughts are all over the board: confusion, stress, excitement, malaise, anger, pleading. Thousands of users are badly confused by Google’s “follow” and “profile” metaphors. Others are wondering how this service compares to the competition. Many just want the whole thing to go away (leading a few how-to guides) or are blasting Google or blasting others for complaining.

It’s a major data mining and natural language processing challenge to analyze the entire body of reactions to the new service, but the general reaction is widespread disorientation and confusion. In the emerging field of security psychology, the first 48 hours of Buzz posts could provide be a wealth of data about about how people react when their privacy expectations are suddenly shifted by the machinations of Silicon Valley.

The need for privacy ombudsmen

Facebook is rolling out two new features with privacy implications, an app dashboard and a gaming dashboard. Take a 30 second look at the beta versions which are already live (with real user data) and see if you spot any likely problems. For the non-Facebook users, the new interfaces essentially provide a list of applications that your friends are using, including “Recent Activity” which lists when applications were used. What could possibly go wrong?

Well, some users may use applications they don’t want their friend to know about, like dating or job-search. And they certainly may not want others to know the time they used an application, if this makes it clear that they were playing a game on company time. This isn’t a catastrophic privacy breach, but it will definitely lead to a few embarrassing situations. As I’ve argued before, users should have a basic privacy expectation that if they continue to use a service in a consistent way, data won’t be shared in a new, unexpected manner of which they have no warning or control, and this new feature violates that expectation. The interesting thing is how Facebook is continually caught by surprise when their spiffy new features upset users. They seem equally clueless with their response: allowing developers to opt an application out of appearing on the dashboard. Developers have no incentive to do this, as they want maximum exposure for their apps. A minimally acceptable solution must allow users to opt themselves out.

It’s inexcusable that Facebook doesn’t appear to have a formal privacy testing process to review new features and recommend fixes before they go live. The site is quite complicated, but a small team should be able to identify the issues with something like the new dashboard in a day’s work. It could be effective with with 1% of the manpower of the company’s nudity cops. Notably, Facebook is trying to resolve a class-action lawsuit over their Beacon fiasco by creating an independent privacy foundation, which privacy advocates and users have both objected to. As a better way forward, I’d call for creating an in-house “privacy ombudsmen” team, which has the authority to review new features and publish analysis of them, as a much more direct step to preventing future privacy failures.

Facebook tosses graph privacy into the bin

Facebook has been rolling out new privacy settings in the past 24 hours along with a “privacy transition” tool that is supposed to help users update their settings.  Ostensibly, Facebook’s changes are the result of pressure from the Canadian privacy commissioner, and in Facebook’s own words the changes are meant to be “new tools to control your experience.” The changes have been harshly criticized in a number of high-profile places:  the New York Times, Wired, CnetTechCrunch, Valleywag, ReadWriteWeb, and by the the EFF and the ACLU. The ACLU has the most detailed technical summary of changes, essentially there are more granular controls but many more things will default to “open to everyone.” It’s most telling to check the blogs used by Facebook developers and marketers with a business interest in the matter. Their take is simple: a lot more information is about to be shared and developers need to find out how to use it.

The most discussed issue is the automatic change to more open-settings, which will lead to privacy breaches of the socially-awkward variety, as users will accidentally post something that the wrong person can read. This will assuredly happen more frequently as a direct result of these changes, even though Facebook is trying to force users to read about the new settings, it’s a safe bet that users won’t read any of it. Many people learn how Facebook works by experience, they expect it to keep working that way and it’s a bad precedent to change that when it’s not necessary. The fact that Facebook’s “transition wizard” includes one column of radio buttons for “keep my old settings” and a pre-selected column for “switch to the new settings Facebook wants me to have” shows that either they don’t get it or they really don’t respect their users. Most of this isn’t surprising though: I wrote in June that Facebook would be automatically changing user settings to be more open, TechCrunch also saw this coming in July.

There’s a much more surprising bit which has been mostly overlooked-it’s now impossible for any user to hide their friend list from being globally viewable to the Internet at large. Facebook has a few shameful cop-out statements about this, stating that you can remove it from your default profile view if you wish, but since (in their opinion) it’s “publicly available information”  you can’t hide it from people who really want to see it. It has never worked this way previously, as hiding one’s friend list was always an option, and there have been many research papers, including a few by me and colleagues in Cambridge, concluding that the social graph is actually the most important information to keep private. The threats here are more fundamental and dangerous-unexpected inference of sensitive information, cross-network de-anonymisation, socially targeted phishing and scams.

It’s incredibly disappointing to see Facebook ignoring a growing body of scientific evidence and putting its social graph up for grabs. It will likely be completely crawled fairly soon by professional data aggregators, and probably by enterprising researchers soon after. The social graph is powerful view into who we are—Mark Zuckerberg said so himself—and  it’s a sad day to see Facebook cynically telling us we can’t decide for ourselves whether or not to share it.

UPDATE 2009-12-11: Less than 12 hours after publishing this post, Facebook backed down citing criticism and made it possible to hide one’s friend list. They’ve done this in a laughably ham-handed way, as friend-list visibility is now all-or-nothing while you can set complex ACLs on most other profile items. It’s still bizarre that they’ve messed with this at all, for years the default was in fact to only show your friend list to other friends. One can only conclude that they really want all users sharing their friend list, while trying to appear privacy-concerned: this is precisely the “privacy communication game” which Sören Preibusch and I wrote of in June. This remains an ignoble moment for Facebook-the social graph will still become mostly public as they’ll be changing overnight the visibility of hundreds of millions of users’ friends lists who don’t find this well-hidden opt-out.

User complaints about photos in Facebook ads

I wrote about the mess caused by Facebook’s insecure application platform nearly 2 months ago. I also wrote about the long-term problems with “informed consent” for data use in social networks. In the past week, both problems came to a head as users began complaining about multiple third-party ad networks using their photos in banner ads. When I mentioned this problem in June, Facebook had just shut down the biggest ad networks for “deceptive practices,” specifically by duping users into a US$20 per month ringtone subscription. The void created by banning SocialReach and SocialHour apparently led to many new advertisers popping up in their place, with most carrying on the practice of using user photos to hawk quizzes, dating services, and the like. The ubiquitous ads annoyed enough users that Facebook was convinced to ban advertisers from using personal data. This is a welcome move, but Facebook underhandedly inserted a curious new privacy setting at “Privacy Settings->News Feed and Wall->Facebook Ads”:

Facebook does not give third party applications or ad networks the right to use your name or picture in ads. If this is allowed in the future, this setting will govern the usage of your information.

With this change, Facebook has quietly reserved the right to re-allow applications to utilise user data in ads in the future and opted everybody in to the feature. We’ve written about social networks minimising privacy salience, but this is plainly deceptive. It’s hard not to conclude this setting was purposefully hidden from sight, as ads shown by third-party applications have nothing to do with the News Feed or Wall. The choices of “No One” or “Only Friends” are also obfuscating, as only friends’ applications can access data from Facebook’s API to begin with; this is a simple “opt-out” checkbox dressed up to making being opted in seem more private. Meanwhile, Facebook has been showing users a patronising popup message on log-in:

Worried about privacy? Your photos are safe. There have been misleading rumors recently about the use of your photos in ads. Don’t believe them. These rumors were related to third-party applications, and not ads shown by Facebook. Get the whole story at the Facebook Blog, or check out the Help Center.

This message is misleading, if not outright dishonest, and shows an alarming dismissal of what was a widespread practice that offended many users. People weren’t concerned with whether their photos were sent to advertisers by Facebook itself or third-parties. They don’t want their photos or names used or stored by advertisers regardless of the technical details. The platform API remains fundamentally broken and gives users no way to prevent applications from accessing their photos. Facebook would be best served by fixing this instead of dismissing users’ concern for privacy as “misleading rumors.”

The Economics of Privacy in Social Networks

We often think of social networking to Facebook, MySpace, and the also-rans, but in reality there are there are tons of social networks out there, dozens which have membership in the millions. Around the world it’s quite a competitive market. Sören Preibusch and I decided to study the whole ecosystem to analyse how free-market competition has shaped the privacy practices which I’ve been complaining about. We carefully examined 45 sites, collecting over 250 data points about each sites’ privacy policies, privacy controls, data collection practices, and more. The results were fascinating, as we presented this week at the WEIS conference in London. Our full paper and complete dataset are now available online as well.

We collected a lot of data, and there was a little bit of something for everybody. There was encouraging news for fans of globalisation, as we found the social networking concept popular across many cultures and languages, with the most popular sites being available in over 40 languages. There was an interesting finding from a business perspective that photo-sharing may be the killer application for social networks, as this features was promoted far more often than sharing videos, blogging, or playing games. Unfortunately the news was mostly negative from a privacy standpoint. We found some predictable but still surprising problems. Too much unnecessary data is collected by most sites, 90% requiring a full-name and DOB. Security practices are dreadful: no sites employed phishing countermeasures, and 80% of sites failed to protect password entry using TLS. Privacy policies were obfuscated and confusing, and almost half failed basic accessibility tests. Privacy controls were confusing and overwhelming, and profiles were almost universally left open by default.

The most interesting story we found though was how sites consistently hid any mention of privacy, until we visited the privacy policies where they provided paid privacy seals and strong reassurances about how important privacy is. We developed a novel economic explanation for this: sites appear to craft two different messages for two different populations. Most users care about privacy about privacy but don’t think about it in day-to-day life. Sites take care to avoid mentioning privacy to them, because even mentioning privacy positively will cause them to be more cautious about sharing data. This phenomenon is known as “privacy salience” and it makes sites tread very carefully around privacy, because users must be comfortable sharing data for the site to be fun. Instead of mentioning privacy, new users are shown a huge sample of other users posting fun pictures, which encourages them to  share as well. For privacy fundamentalists who go looking for privacy by reading the privacy policy, though, it is important to drum up privacy re-assurance.

The privacy fundamentalists of the world may be positively influencing privacy on major sites through their pressure. Indeed, the bigger, older, and more popular sites we studied had better privacy practices overall. But the desire to limit privacy salience is also a major problem because it prevents sites from providing clear information about their privacy practices. Most users therefore can’t tell what they’re getting in to, resulting in the predominance of poor-practices in this “privacy jungle.”