Category Archives: Authentication

My Yahoo! password histograms are now available (with differential privacy!)

5 years ago, I compiled a dataset of password histograms representing roughly 70 million Yahoo! users. It was the largest password dataset ever compiled for research purposes. The data was a key component of my PhD dissertation the next year and motivated new statistical methods for which I received the 2013 NSA Cybersecurity Award.

I had always hoped to share the data publicly. It consists only of password histograms, not passwords themselves, so it seemed reasonably safe to publish. But without a formal privacy model, Yahoo! didn’t agree. Given the history of deanonymization work, caution is certainly in order. Today, thanks to new differential privacy methods described in a paper published at NDSS 2016 with colleagues Jeremiah Blocki and Anupam Datta, a sanitized version of the data is publicly available.

Continue reading My Yahoo! password histograms are now available (with differential privacy!)

Can we crowdsource trust?

Your browser contains a few hundred root certificates. Many of them were put there by governments; two (Verisign and Comodo) are there because so many merchants trust them that they’ve become ‘too big to fail’. This is a bit like where people buy the platform with the most software – a pattern of behaviour that let IBM and then Microsoft dominate our industry in turn. But this is not how trust should work; it leads to many failures, some of them invisible.

What’s missing is a mechanism where trust derives from users, rather than from vendors, merchants or states. After all, the power of a religion stems from the people who believe in it, not from the government. Entities with godlike powers that are foisted on us by others and can work silently against us are not gods, but demons. What can we do to exorcise them?

Do You Believe in Tinker Bell? The Social Externalities of Trust explores how we can crowdsource trust. Tor bridges help censorship victims access the Internet freely, and there are not enough of them. We want to motivate lots of people to provide them, and the best providers are simply those who help the most victims. So trust should flow from the support of the users, and it should be hard for powerful third parties to pervert. Perhaps a useful mascot is Tinker Bell, the fairy in Peter Pan, whose power waxes and wanes with the number of children who believe in her.

Double bill: Password Hashing Competition + KeyboardPrivacy

Two interesting items from Per Thorsheim, founder of the PasswordsCon conference that we’re hosting here in Cambridge this December (you still have one month to submit papers, BTW).

First, the Password Hashing Competition “have selected Argon2 as a basis for the final PHC winner”, which will be “finalized by end of Q3 2015”. This is about selecting a new password hashing scheme to improve on the state of the art and make brute force password cracking harder. Hopefully we’ll have some good presentations about this topic at the conference.

Second, and unrelated: Per Thorsheim and Paul Moore have launched a privacy-protecting Chrome plugin called Keyboard Privacy to guard your anonymity against websites that look at keystroke dynamics to identify users. So, you might go through Tor, but the site recognizes you by your typing pattern and builds a typing profile that “can be used to identify you at other sites you’re using, were identifiable information is available about you”. Their plugin intercepts your keystrokes, batches them up and delivers them to the website at a constant pace, interfering with the site’s ability to build a profile that identifies you.

Passwords 2015 call for papers

The  9th International Conference on Passwords will be held at Cambridge, UK on 7-9 December 2015.

Launched in 2010 by Per Thorsheim,  Passwordscon is a lively and entertaining conference series dedicated solely to passwords. Passwordscon’s unique mix of refereed papers and hacker talks encourages a kind of cross-fertilization that I’m sure you’ll find both entertaining and fruitful.

Paper submissions are due by 7 September 2015. Selected papers will be included in the event proceedings, published by Springer in the Lecture Notes in Computer Science (LNCS) series.

We hope to see lots of you there!

Graeme Jenkinson, Local arrangements chair

Security Protocols 2015

I’m at the 23rd Security Protocols Workshop, whose theme this year is is information security in fiction and in fact. Engineering is often inspired by fiction, and vice versa; what might we learn from this?

I will try to liveblog the talks in followups to this post.

Financial Cryptography 2015

I will be trying to liveblog Financial Cryptography 2015.

The opening keynote was by Gavin Andresen, chief scientist of the Bitcoin Foundation, and his title was “What Satoshi didn’t know.” The main unknown six years ago when bitcoin launched was whether it would bootstrap; Satoshi thought it might be used as a spam filter or a practical hashcash. In reality it was someone buying a couple of pizzas for 10,000 bitcoins. Another unknown when Gavin got involved in 2010 was whether it was legal; if you’d asked the SEC then they might have classified it as a Ponzi scheme, but now their alerts are about bitcoin being used in Ponzi schemes. The third thing was how annoying people can be on the Internet; people will abuse your system for fun if it’s popular. An example was penny flooding, where you send coins back and forth between your sybils all day long. Gavin invented “proof of stake”; in its early form it meant prioritising payers who turn over coins less frequently. The idea was that scarcity plus utility equals value; in addition to the bitcoins themselves, another scarce resources emerges as the old, unspent transaction outputs (UTXOs). Perhaps these could be used for further DoS attack prevention or a pseudonymous identity anchor.

It’s not even clear that Satoshi is or was a cryptographer; he used only ECC / ECDSA, hashes and SSL (naively), he didn’t bother compressing public keys, and comments suggest he wasn’t up on the latest crypto research. In addition, the rules for letting transactions into the chain are simple; there’s no subtlety about transaction meaning, which is mixed up with validation and transaction fees; a programming-languages guru would have done things differently. Bitcoin now allows hashes of redemption scripts, so that the script doesn’t have to be disclosed upfront. Another recent innovation is using invertible Bloom lookup tables (IBLTs) to transmit expected differences rather than transmitting all transactions over the network twice. Also, since 2009 we have FHE, NIZLPs and SNARKs from the crypto research folks; the things on which we still need more research include pseudonymous identity, practical privacy, mining scalability, probabilistic transaction checking, and whether we can use streaming algorithms. In questions, Gavin remarked that regulators rather like the idea that there was a public record of all transactions; they might be more negative if it were completely anonymous. In the future, only recent transactions will be universally available; if you want the old stuff you’ll have to store it. Upgrading is hard though; Gavin’s big task this year is to increase the block size. Getting everyone in the world to update their software at once is not trivial. People say: “Why do you have to fix the software? Isn’t bitcoin done?”

I’ll try to blog the refereed talks in comments to this post.

Why password managers (sometimes) fail

We are asked to remember far too many passwords. This problem is most acute on the web. And thus, unsurprisingly, it is on the web that technical solutions have had most success in replacing users’ ad hoc coping strategies. One of the longest established and most widely adopted technical solutions is a password manager: software that remembers passwords and submits them on the user’s behalf. But this isn’t as straightforward as it sounds. In our recent work on bootstrapping adoption of the Pico system [1], we’ve come to appreciate just how hard life is for developers and maintainers of password managers.

In a paper we are about to present at the Passwords 2014 conference in Trondheim, we introduce our proposal for Password Manager Friendly (PMF) semantics [2]. PMF semantics are designed to give developers and maintainers of password managers a bit of a break and, more importantly, to improve the user experience.

Continue reading Why password managers (sometimes) fail

Nikka – Digital Strongbox (Crypto as Service)

Imagine, somewhere in the internet that no-one trusts, there is a piece of hardware, a small computer, that works just for you. You can trust it. You can depend on it. Things may get rough but it will stay there to get you through. That is Nikka, it is the fixed point on which you can build your security and trust. [Now as a Kickstarter project]

You may remember our proof-of-concept implementation of a password protection for servers – Hardware Scrambling (published here in March). The password scrambler was a small dongle that could be plugged to a Linux computer (we used Raspberry Pi). Its only purpose was to provide a simple API for encrypting passwords (but it could be credit cards or anything else up to 32 bytes of length). The beginning of something big?

It received some attention (Ars Technica, Slashdot, LWN, …), certainly more than we expected at the time. Following discussions have also taught us a couple of lessons about how people (mostly geeks in this contexts) view security – particularly about the default distrust expressed by those who discussed articles describing our password scrambler.

We eventually decided to build a proper hardware cryptographic platform that could be used for cloud applications. Our requirements were simple. We wanted something fast, “secure” (CC EAL5+ or even FIPS140-2 certified), scalable, easy to use (no complicated API, just one function call) and to be provided as a service so no-one has to pay upfront the price of an HSM if they just want to have a go at using proper cryptography for their new or old application. That was the beginning of Nikka.

nikka_setup

This is our concept: Nikka comprises a set of powerful servers installed in secure data centres. These servers can create clusters delivering high-availability and scalability for their clients. Secure hardware forms the backbone of each server that provides an interface for simple use. The second part of Nikka are user applications, plugins, and libraries for easy deployment and everyday “invisible” use. Operational procedures, processes, policies, and audit logs then guarantee that what we say is actually being done.

2014-07-04 08.17.35We have been building it for a few months now and the scalable cryptographic core seems to work. We have managed to run long-term tests of 150 HMAC transactions per second (HMAC & RNG for password scrambling) on a small development platform while fully utilising available secure hardware. The server is hosted at ideaSpace and we use it to run functional, configuration and load tests.

We have never before designed a system with so many independent processes – the core is completely asynchronous (starting with Netty for a TCP interface) and we have quickly started to appreciate detailed trace logging we’ve implemented from the very beginning. Each time we start digging we find something interesting. Real-time visualisation of the performance is quite nice as well.
real_time_monitoring

Nikka is basically a general purpose cryptographic engine with middleware layer for easy integration. The password HMAC is this time used only as one of test applications. Users can share or reserve processing units that have Common Criteria evaluations or even FIPS140-2 certification – with possible physical hardware separation of users.

If you like what you have read so far, you can keep reading, watching, supporting at Kickstarter. It has been great fun so far and we want to turn it into something useful in 2015. If it sounds interesting – maybe you would like to test it early next year, let us know! @DanCvrcek

Pico part IV – Somethings you have

A light-hearted look at the ideas presented a couple of weeks ago in a paper at the UPSIDE workshop in Seattle.

USS Enterprise going even more boldly than usual

One of the problems inherent in boldly going where no man has gone before is that, more often than you might imagine, you may be required to blow up the spaceship on which you’re travelling. James T Kirk was forced to do this at least once, and he and his successors came perilously close to it on several other occasions.

But who gets to make this decision? Given the likelihood that some members of the crew may be possessed by alien life-forms at the time, it seems unwise to leave such decisions to any single individual, even if he’s the captain. And you also can’t require any specific other staff-member to back him up, since they may have had to sacrifice themselves earlier for the good of the many. Auto-destruct sequences, therefore, can typically only be initiated when any three senior officers agree and give their secret passwords.

It is a wonderful sign of the times that, when the first Star Trek episode in which this happens was broadcast in 1969, the passwords required to detonate a spaceship with 400 passengers on board were noticeably weaker than those now required to log in to a typical Kickstarter project.

Secret sharing – background

Some of the most beautiful examples, I believe, of using a simple, readily-understandable idea to solve a complex problem can be found in the secret-sharing systems proposed almost simultaneously by Adi Shamir and George Blakley, way back in 1979. These are just the sort of thing you need to need to control self-destruct sequences, and are certainly better than the four-character passwords used by Kirk and Scotty.

Most readers of LBT will be familiar with this, but in case you aren’t, the underlying concept is to encode the secret you need to protect – for example, the auto-destruct code – as the coefficients of a particular quadratic equation. If you know any three points on the curve, you can deduce the underlying equation. So all you have to do is give each of Kirk, Scotty, Spock and Bones the coordinates of a point, and any three of these ’shares’ can then unlock the secret which will trigger the detonation. If you want to insist on four or more officers, you use a higher-order polynomial. This is Shamir’s algorithm; Blakley’s is similar, but whichever you use, there is, I think, a real elegance in solving a difficult technical problem using a concept that can be explained in a paragraph to anyone with high-school level maths.

A somewhat more down-to-earth example of its use can be found in the DNSSEC system used to secure the core servers of the DNS, on which so much of the internet depends. The master key needed to reset this, in the event of some global calamity, is divided up between seven keyholders based in different countries. One of them told me over a beer recently that it required any five of them to get together in the US to be able to reconstruct the system.

PICO

In the Pico project (see earlier posts), we’re exploring the same secret-sharing concepts but at a personal rather than a global or galactic level.

The idea is that your Pico device, which provides authentication on your behalf to a wide variety of systems, should only be able to do so when it is confident that you are present. There are several ways it might detect that – biometrics perhaps being the most obvious – but we wanted a system that would be completely automatic, continuous and non-intrusive: it wouldn’t require you to re-scan your retina every few minutes, for example.

So the Pico assumes that you are present only if it can detect, nearby, a sufficient number of other devices that you normally carry. These ‘Picosiblings’ may be special devices dedicated to this purpose – bluetooth-enabled cufflinks or earrings, for example – or the Picosibling functionality may be built into phones, smart watches, laptops and car key-fobs. But, together, a sufficient number of them constitute an ‘aura’ of safety in which the Pico feels comfortable about releasing your credentials when you use it to log in.

You could just program the Pico to make this decision, but a more secure approach is to use secret-sharing. Your Pico stores your credentials in encrypted form, and the Picosiblings actually enable it by each giving up a share of the secret used to do the encryption. Since this information is not cached, even the Pico cannot decide to expose the information when you are not around.

Some challenges

This basic Picosibling concept is not bad, but can we improve it? How well can we model real-world user behaviour using these ideas? Where might they fall down, and do we need to invent anything new to deal with the edge cases? And can it help us make better decisions about when to blow up a starship?

Let’s consider some scenarios.

1. My Precious

My car keys are with me about 30% of the time, and may occasionally be with someone else. My wedding ring, on the other hand, has been a 100% reliable indicator of my presence for the last 23 years. (It’s too bad my wife didn’t give me a Bluetooth-enabled one.) We all have different possessions which may be more or less reliable indicators that we are around, and we can represent this by giving them different numbers of shares. Most of my Picosiblings might have one share, but my wallet and phone might have two, and my wedding ring four. If my ring is absent, you’ll need significantly more confirmation from other sources, in the same way that you might need disproportionately more senior officers if the captain is absent.

2. Smarter Picosiblings

Is my watch a good indicator of my presence? Well, yes, when it’s with me, but not when it’s sitting on my bedside table. There may be occasions when Picosiblings are more or less confident that you are present. So we can give each sibling a certain number of shares, but it may not choose to release them to the Pico in all situations. A sibling that detects and recognises your heartbeat might give out differing numbers of its shares based on how confident it is about its recognition. Something you normally wear or carry might include an accelerometer, and give out fewer shares if it hasn’t moved in the last few minutes. The number of shares might decay over time, as the device gets further from the moment at which it was confident of your presence, so a fingerprint sensor might be sufficient to unlock your Pico on its own, if you’ve used it in the last ten seconds. Another metric might be proximity: if the Picosibling could detect the Pico was close by, it might be willing to hand over more shares than if it were on the other side of the room.

3. The trouble with Klingons

But suppose we need something more sophisticated than simply the raw number of shares to unlock the secret? If your starship has a large number of Klingon officers, who place a high value on dying honourably in battle, you might wish to insist that at least two races were involved in any major decision like this.

Fortunately we don’t need a new mechanism: we can just use our current system twice over. We split the core secret into several shares: one for humans, one for Vulcans, one for Klingons, one for Betazoids. Each of those shares can then be subdivided in the same way for the individuals concerned. Two or more Vulcans are needed to reconstruct the Vulcan share, and two or more such species actually to blow up the ship.

In the Pico world, suppose your shoes are all Picosiblings. If you have many pairs of shoes, someone entering your bedroom may find they have plenty of shares even if you aren’t around. We can use this method of creating ‘categories of shares’ to limit the effect that a particular class of device can have, to ensure that shoes alone can never be sufficient to do the unlocking.

4. Greater than the sum of its parts

Suppose my my car, my car keys, and my house keys are all Picosiblings, and they each have one share. Anyone inside my car is likely to have my car keys, including a thief who has just found them on the pavement. But we could reasonably argue that someone who is inside my car and has my house keys is more likely to be me; that a stranger is less likely to have this particular combination of Picosiblings, and so their conjunction should be worth more than might be indicated by a simple addition of their values.

We can achieve this in various ways; a simple approach is just to cut a share in half and give one half to each sibling. These would be sent to the Pico along with the complete shares, and if it receives matching halves from two devices, it would be able to construct extra shares which correspond to the value of their relationship.

5. You can’t take that away from me

Here’s a challenge for readers. Can you think of a good way to implement negative shares? Let me explain…

Suppose you are in an airport check-in queue and a significant number of your worldly possessions are in the suitcase beside you. Someone who pinches your Pico from your pocket will have plenty of shares accessible either by standing behind you, or by getting close to your suitcase at some later point in the baggage-handling process. Travelling is one of those situations when you might feel a bit vulnerable and so require more confirmation of your presence than you would in your home country. If your suitcase could emit negative shares, then it would raise the threshold of affirmation needed by your Pico when it was around.

There are many challenges here, both technical and practical: for example, anyone who knows that there is a negative influence nearby may be able to remove it, e.g. by tipping the items out of the suitcase. But it would at least deter subtle unobserved attacks in the check-in queue. It would be good if observers, and ideally the Pico itself, could not distinguish positive from negative shares except with regards to their final effect. And so on. As I said: a challenge for the reader!

Sharing options

So we’ve introduced several ideas which can help with some real-life scenarios, yet they’re all based on the same simple secret-sharing concept:

  • Variable numbers of shares per device (based on how reliable each device is as an indicator)
  • Dynamically-changing number of shares (allowing devices to indicate their confidence in your or the Pico’s presence)
  • Categories of shares (allowing us to restrict the influence of a single class of devices)
  • Split shares (allowing combinations of devices to contribute more than simply the sum of their shares)
  • Negative shares (to allow some devices to indicate situations of vulnerability)

Extending the tree

Let me finish with one last question. To what extent are Picos and Picosiblings fundamentally different? Picos gain confidence to submit your credentials to a service based on the reassurance they get from Picosiblings. But we’ve also discussed the idea of Picosiblings getting confidence to submit their shares to a Pico based on other factors in the environment. Perhaps we could extend this hierarchy in the other direction, too.

A company might need k-of-n shares from the major investors’ Picos to appoint their representative director to the board. The company Pico might then need k-of-n of the directors’ Picos to authorise a major decision or transaction. And a commercial building might need k-of-n of the companies’ boards to agree to the resurfacing of the parking lot. This could be extended to political, national, even global decisions.

Perhaps I’m pushing the simple secret-sharing concept too far. But at the very least it’s an interesting thought experiment to consider how much we can represent real-world situations with this one idea.

After all, we need a simple, reliable and secure way to make these decisions in our own back yard, before we start interacting with the rest of the universe.