Cloudy with a Chance of Privacy

Three Paper Thursday is an experimental new feature in which we highlight research that group members find interesting.

When new technologies become popular, we privacy people are sometimes miffed that nobody asked for our opinions during the design phase. Sometimes this leads us to make sweeping generalisations such as “only use the Cloud for things you don’t care about protecting” or “Facebook is only for people who don’t care about privacy.” We have long accused others of assuming that the real world is incompatible with privacy, but are we guilty of assuming the converse?

On this Three Paper Thursday, I’d like to highlight three short papers that challenge these zero-sum assumptions. Each is eight pages long and none requires a degree in mathematics to understand; I hope you enjoy them.


Reclaiming space from duplicate files in a serverless distributed file system“, JR Douceur et al., International Conference on Distributed Computing Systems (ICDCS), 2002.

This paper, when stripped of implementation details, contains a simple, elegant idea: convergent encryption.

The Cloud is a great place to store data reliably. One feature is de-duplication: there is no need to back up everyone’s copy of Papers separately, nor does every conference attendee need to save their own copy of the official photo. This efficient pooling of shared resources is the kind of thing that makes the cloud so attractive. On the other hand, cloud providers can make mistakes—just ask Dropbox users. Rather than depending on the cloud’s security, it’s a good idea to protect sensitive information with cryptography, but that negates the shared benefit that comes from de-duplication.

Convergent encryption is a deterministic way of encrypting things. You generate a secret key by hashing the content of a file, then encrypt that key under your key. Anyone who encrypts the same plaintext will get the same ciphertext, restoring our ability to de-duplicate storage. Of course, an attacker can decrypt the file if she knows the plaintext, but then why bother decrypting?

Convergent encryption alone does not provide anonymity: a business (e.g. the MPAA) could ask the Cloud, “have you already seen this content?” then send lawyers to ask “who uploaded it?” If all you want is confidentiality, though, convergent encryption provides an elegant solution to a real-world problem. Confidentiality can co-exist with the benefits of the public Cloud.


Privacy Protection for Social Networking Platforms“, A Felt and D Evans, Web 2.0 Security & Privacy (W2SP), 2008.

Privacy and performance don’t have to be enemies, even in the oft-villanised realm of online social networking.

In this paper, Felt and Evans studied the top 150 Facebook applications and found that 90% of them didn’t need any of the user data which they were able to access while the other 10% were largely using personal information for trivial things such as displaying it to the user or choosing a horoscope. Of the 14 applications with non-trivial data use, four were contravening Facebook’s Terms of Service.

The paper proposes “privacy-by-proxy”, an extension to the protocols spoken by third-party social applications. Like Facebook’s own FBML (Facebook Markup Language), the privacy-by-proxy system would allow applications to name information without reading it. For instance, an application could tell Facebook UI to “insert the user’s name and a list of friends here” without knowing that user’s name.

Facebook provided FBML for performance reasons: inserting the <fb:name> tag could eliminate a round trip between application servers and Facebook. If such identifiers were mandatory, it would greatly improve privacy protection and would also improve performance for overly-communicative applications. Since 2008, spellings have changed (FBML is deprecated in favour of the JavaScript API, etc.), but the core ideas are still valid: proxying access to user data could improve privacy and performance.


Aligning Security and Usability“, KP Yee, IEEE Security & Privacy Magazine 2(5), 2004.

Good security and good usability are both about inferring the user’s intent.

It has often been assumed that security and usability are intrinsically opposed forces. Security is assumed to mean “procedures that get in the way of getting work done” or even “lots of pop-up dialogs asking for permission”, whereas usability is assumed to be about pretty pixels and forcing programmers to use the mouse more often. In reality, though, we are in the same business: inferring user intent. To use a gross simplification, usability is about helping users do what they want and security is about preventing things that users don’t want.

Yee observes that security and usability come into conflict when software developers disregard the principle of least privilege. If a word processor is able to delete any file on the computer, then pop-ups asking “do you really want to delete this file?” start to look attractive. If, on the other hand, that word processor has can only access files which the user explicitly opens, there is no need to second-guess their intent every time a file is modified.

This model was explored in the Polaris system for Windows XP and our very own Capsicum for FreeBSD, but Apple brought powerboxes to the mainstream with the Mac OS X App Sandbox. I hope to see more of this in the future: software that treats security and usability as complementary partners rather than conflicting priorities.

5 thoughts on “Cloudy with a Chance of Privacy

  1. Some interesting ideas, particularly the convergent encryption one, but since that paper is 10 years old it would be even more interesting to see your thoughts as to why the idea hasn’t been implemented.

    Even the most recent article has issues, since Facebook has changed significantly since 2008. I would say it’s likely that the article’s conclusions are even more valid now than they were 4 years ago, but given the significant changes in the way Facebook operates and the rise of mobile in the last couple of years, it would be interesting to see your thoughts on that.

  2. Further to John’s comment, it seems as though convergent encryption HAS been implemented, quite recently, by a cloud services provider called ‘Bitcasa’. Know anything about this? It does have the appeal of ‘elegance’, as you say – though the potential drawbacks/limitations are interesting too. I guess its usefulness will depend on what people want to use it for (this article seems to give quite a balanced (non-technical) appraisal):
    http://www.extremetech.com/computing/96693-how-convergent-encryption-makes-bitcasas-infinite-storage-possible

  3. Here is a way to encrypt all of your data in the Cloud, in a slightly different and more secure way:

    http://irdial.com/blogdial/?p=2362

    There is no reason why this should not be the default. Storing personal data in the Cloud in plaintext is not a good idea, from the privacy point of view, but also from the service providers side.

    A Cloud service provider managing a system that is encrypted by default will have less demands for immoral and costly access from the State. This means everyone wins; bigger profits for the service provider, privacy for the user.

  4. @John:
    Something like convergent encryption was used in Freenet and its GNU clone (GNUnet). People talk about it when using Tahoe, although some folks in that community consider the file-confirmation attack to be a bigger deal than I personally do.

    I was not aware of Bitcasa, so thanks @Carolyn! I hope that they do well, because client-side encryption *is* compatible with the cloud, and this seems like a very useful business model. If the crypto is what they claim, then they must do their video transcoding on the client side too… interesting.

    @Carolyn:
    The article which you linked to contains an oversimplification:
    “If a third-party knows most of the information in a file, they may be able to derive a large part of the key from that.”

    This is the so-called learn-partial-information attack that the Tahoe community speaks of. In reality, it’s not a matter of learning information, it’s a matter of confirming that “a file from the following list exists on the system” where the list of files includes e.g. configuration files with a mostly-common format and a field for a password. I find this attack to be over-hyped: it’s really just an offline dictionary attack. The countermeasure is as old as the attack: passwords should not be stored unsalted.

  5. @Roger:
    You don’t actually need to bolt GPG on in order to get non-Google-visible sync; Chrome has custom passphrase support built in.

    If you want to encrypt e.g. Google Docs though, what’s the point in staying with Google Docs at all? On the one hand, their UI won’t work properly with it. On the other hand, they’re in a position to see who’s looking at your documents, and the social graph is usually far more important than the content of communications. If you don’t want Google to be able to see anything you do, you are better off encrypting text files or even Word files and storing them somewhere else, like in a cloudy social service that I will describe in my forthcoming PhD dissertation. 🙂

    As someone who has been “on the inside” at Google, however, I have no problem in using Google Docs or GMail. I know enough people there who care enough about privacy that I don’t believe user data is abused except in pretty exceptional circumstances. Now, if all of the privacy people leave Google at once and don’t say why, maybe it’s time to get worried. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *