Heartbleed and RSA private keys

UPDATE:

Matthew Arcus pointed out that a freed buffer will always have the first 16 bytes (x86_64) corrupted by two free list pointers, according to glibc’s malloc implementation. This means the leak mechanism I described below will only explain the partial leak I exploited, but not complete prime number leaks others have seen. In fact, the uncorrupted numbers come from the long-lived copy of p and q in the Montgomery context.

——–

By now everyone knows about the OpenSSL Heartbleed vulnerability: a missing bounds check in one of the most popular TLS implementations has made millions of web servers (and others) leak all sorts of sensitive information from memory. This could leak login credentials, authentication cookies and web traffic to attackers. But could it be used to recover the site’s TLS private key? This would enable complete decryption of previously-recorded traffic if perfect forward secrecy was not negotiated at the time and otherwise Man-in-The-Middle attacks to all future TLS sessions.

Since this would be a much more serious consequence of Heartbleed, I decided to investigate. The results were positive: I was able to extract private keys from a test Nginx server after a few day’s work. Later I applied my techniques to solve the CloudFlare Challenge. Along with a few other security researchers, we independently demonstrated that RSA private keys are indeed at risk. In this blog post, I’ll provide some detail on how to extract the private key and why the attack is possible.

How to extract the private key

Readers not familiar with RSA can read about it here. To simplify things a bit, a large (2048 bits) number N is constructed by multiplying together two large randomly generated prime numbers p and q. N is made public while p and q are kept secret. Finding p or q allows recovery of the private key. A generic attack is just factorising N, but this is believed to be difficult. However, with a vulnerability like Heartbleed, the attack is much simpler: because the web server needs the private key in memory to sign the TLS handshake, p and q must live in memory and we can try to obtain them with Heartbleed packets. Then the problem simply becomes how to identify them in the returned data. This is easy, as we know p and q are 1024 bits (128 bytes) long, and OpenSSL represents big numbers little-endian in memory, so a brute-force approach of treating every 128 consecutive bytes in the heartbleed packets as little-endian numbers and testing if it divides N is sufficient to spot potential leaks. This is how most people solved the CloudFlare challenge.

Coppersmith improvement

But wait, isn’t our scenario just like a cold boot attack? And there has been a lot of research on recovering RSA private keys with partial information. One of the most famous papers from Coppersmith presents message recovery attacks on related messages or insufficient padding, as well as factorisation with partial knowledge with the help of lattice basis reduction. With the Coppersmith attack, N can be efficiently factorised if either the top or bottom half bits of p is known. Put it in context, we only need the top/bottom 64 bytes of p in order to compute the private keys, compared to the naive brute-force which requires all 128 bytes. In practice, reaching Coppersmith’s limit is computational expensive (although still much better than factorisation), but assuming 77 bytes (60%) known we can comb the heartbleed packets for potential private key segments very quickly.

In retrospect, there were 242 instances of private key remnants suitable for the Coppersmith attack, out of 10,000 packets (64 KB each) that I had collected. Implementation of the Coppersmith attack was made easy thanks to the comprehensive computer algebra building blocks from Sage (although I later found out that Sage already has Coppersmith implemented).

Can we do even better? If you have ever viewed a RSA private key using openssl rsa -text -in server.key, you will notice that there are quite a few numbers other than the two prime factors p and q. In fact, they are precomputed values for RSA’s Chinese Remainder Theorem optimisation. If some of them are leaked they can also be used to deduce p. And what about the Montgomery representations of p and q that OpenSSL use for fast multiplication? They also admit a variant of Coppersmith so partial bits are useful as well. With this in mind I set out to search for them in the collected packets from my test server where these values are all known. But not even single one of them appears partially (> 16 bytes) in the dataset. How is this possible?

Note: all my experiments and the CloudFlare challenge are targeting Nginx which is single-threaded. It is possible that for a multi-threaded web server, more leakage can be observed.

Why p and only p leaks

When heartbleed first came out, people argued that RSA private keys should not be leaked because they are loaded when the web server starts so they occupy a lower memory address, and because the heap grows upwards, a later-allocated buffer leaked by Heartbleed shouldn’t reach them. This argument is consistent with my inability to find any CRT precomputed values, but with one exception: p is definitely leaked somehow. If we assume this argument is correct, the question becomes: why is p leaked?

To add more to the mystery, OpenSSL apparently scrubs all temporary BigNums it used! To reduce the overhead of dynamically allocating temporary values OpenSSL provides a BN_CTX which is a pool of BigNums operating in stack (LIFO) fashion. Upon finishing the context is destroyed and all allocated buffers are scrubbed. This would mean that when the heartbleed packet is constructed there shouldn’t be any temporary values left in memory (again assuming single thread) because the BN_CTX has long been released.

I won’t bother you with all the pain I went through to identify the cause, so here is the spoiler: when a BigNum is being extended to a bigger buffer size, its original buffer is not zeroised before being freed. The chain of the control flow path that leads to the leak of p is even more subtle. During intial TLS handshake the server key exchange is signed with the private key. The CRT signing performs a modulo p operation, which caused p<<BN_BITS2 to be stored in a temporary variable allocated from the BN_CTX pool. Later in the CRT fault-injection check, that temporary variable is reused (remember BN_CTX operates like a stack) as val[0]. An interesting fact is that a reallocated temporary variable only gets its lowest nibble zeroised, and in the case of p<<BN_BITS2 nothing is destroyed. val[0] immediately receives a Montgomery-reduced value, but since the original buffer is unable to accommodate the new value, it gets extended and p gets released to free heap space, waiting to be grabbed. And because this happens every time a TLS handshake occurs, it can be spilled everywhere.

Since it is difficult to find which BigNums may be extended and leaked statically, I instrumented OpenSSL and experimented a bit. It turned out that a shifted version of the Montgomery representation of p will also be freed at the leaky point, but that only happened at the first RSA exponentiation when the Montgomery context is initialised, so it will live in a low memory address and I was unable to locate it in any collected packets.

The OpenSSL team has been notified about the above leak. Although to be fair this is not exactly a security bug by itself as OpenSSL is never explicitly designed to prevent sensitive data leakage to heap.

I’d like to thank Joseph Bonneau for the useful comments and proof reading.

10 thoughts on “Heartbleed and RSA private keys

  1. Good writeup, sounds much more reasonable than other (early) speculations on mmap() behaviour.

    Do you also have some additional insight on the dysfunctional malloc() buffer over-read protection (caused by the openssl wrappers)?

    Bernd

  2. Thank you for the detailed explanation. It makes more sense than other theories circulated earlier (in particular, it explains why only p is leaked, and not other elements of the key contained in the same data structure).

  3. Nice attack. I found there was another place where the primes get copied into higher heap locations:

    http://matthewarcus.wordpress.com/2014/04/16/how-to-leak-a-key/

    The data from the Cloudflare server I looked at seemed to have one of the primes complete in what looked like an unfreed malloc buffer (I got this back from the server 3 times in 20000 requests, so less often that the fragments you found):

    0021b0c0 00 4c 8d 36 8f 91 00 00 00 00 00 00 00 99 c3 ef
    0021b0d0 6d f2 00 56 c6 97 a0 9e eb 93 8d ab c0 ba 81 ae

    0021b140 32 f3 37 33 f9 2b e2 4e 23 ad 13 ea c4

    (the 91 is indicating the buffer size, the prime itself is 99 c3 .. ea c4).

  4. @Bernd Eckenfels:
    Sorry I haven’t looked into OPENSSL_malloc myself. Maybe this one?

    @Pádraig Brady
    Thanks for the suggestion. Links fixed now.

    @Matthew
    Nice post. I thought the Montgomery copy is only initialised once during first RSA exponentiation so it should also in lower memory and unlikely to leak. But correspondence with CloudFlare forks suggests that they also saw them in the heartbleed packets.

  5. That’s right – but the first RSA exponentiation happens after the input and output buffers for the first connection are allocated, so they are quite likely to be in higher locations in the heap and since OpenSSL reuses its I/O buffers, the primes are also likely to be availabe to later heartbleed attacks as well.

  6. If i have perfect forward secrecy enabled AFTER the heartbleed attack and assuming someone did steal my private key,

    Do i still need to rekey and get a new cert from the CA and why – I don’t care about the 2% IE 6 users, any other reason i should ?

    Thank you.

  7. Yes, you do want to renew your private key to prevent impersonation attacks. If your private key is known, regardless of what ciphersuite you use, the attacker can impersonate you with your private key.

  8. I don’t know why there is any mystery about Heartbleed. I knew about this bug when it was introduced to the original OpenSSL source code researching an exploit vector on a website a client owns. I haven’t ever, not one single time, abused the bug on any commercial servers for any personal gain. ONLY penetration/security/audit testing.

    My sites have been immune to this particular bug since I was provided the information concerning it. that’s two years shy of three days when it was introduced. My sites were vulnerable for the first three days, only. Being a coder had advantages. This is one of the more useful ones. Nothing malicious, but damn useful. I’m protected from this. My clients servers were never put in harms way.

    You think I’m bullshitting but I can prove what I wrote. [g]

  9. This was just fixed in LibreSSL (ie. in the OpenBSD tree). Or at least the subset we are aware of so far…

    Any bignum library probably has to make a decision: performance, or security via cleansing of intermediate values. The decision made by perl’s bignum library will be different than the decision in a codebase designed specfically for key processing.
    I actually suspect perl’s codebase does a better job.

    The applications using this library are too large and complex and bug ridden, therefore it is clear the library should cleanse memory — aggressively. Smart logic could be introduced to know when to clean, but anyone who suggests that hasn’t dug deep enough..

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>