When Layers of Abstraction Don’t Get Along: The Difficulty of Fixing Cache Side-Channel Vulnerabilities

(co-authored with Robert Watson)

Recently, our group was treated to a presentation by Ruby Lee of Princeton University, who discussed novel cache architectures which can prevent some cache-based side channel attacks against AES and RSA. The new architecture was fascinating, in particular because it may actually increase cache performance (though this point was spiritedly debated by several systems researchers in attendance). For the security group, though, it raised two interesting and troubling questions. What is the proper defence against side-channels due to processor cache? And why hasn’t it been implemented despite these attacks being around for years?

The history of cached memory being used a side channel goes all the way back to the classic page-fault password attacks against TENEX reported in 1976. The possibility of using processor cache as a side channel against crypto routines was suggested by Paul Kocher in 1996, and again by John Kelsey et al. in 1998, but largely ignored during AES standardisation. Rijndael was selected despite its heavy reliance on table lookup to achieve good performance in software. Daniel Page described theoretical cache attacks against DES in more detail in 2002. Daniel Bernstein finally broke the flood gates open in 2005 with an experimental statistical timing attack on AES. This was followed over the next year by Colin Percival’s cache-snooping attacks against RSA on hyperthreaded processors, Dag Osvik et al.’s cache-preloading-and-inspection attacks, Guido Bertoni et al.’s cache power-consumption attacks, and my own cache-collision timing attacks.

All of the AES attacks have a common structure and cause: AES performs table lookups at indexes dependent on individual bytes of the key. Cache being a shared resource, attackers can potentially see the side-effects of these lookups, such as eviction of the attacker’s data from cache, or increased time and power consumption due to the ratio of hits to misses. This is an excellent example of how security vulnerabilities get overlooked: a bizarre interaction of algorithmic optimisations of AES and the architecture of processor caches. Both of these are messy details which were designed to be abstracted away from the majority of their users.

Here we are in 2009, and the vulnerability still exists. Interestingly, the problem hasn’t seen a lack of proposed solutions (there have been dozens), but too many solutions at different levels of abstraction, each with its own drawbacks:

Algorithmic level: AES could be de-certified for situations where an attacker may have access to side-channel information. AES runner-up Serpent avoided table-lookups, as do most candidates in the current eSTREAM and SHA-3 competitions. Of course, this relies on millions of implementers to determine if side-channel attacks may apply and then choose an appropriate alternative to AES. For the future, we can chalk up the use of table-lookups in cryptographic software as a one-time mistake and move on, but for now we’re stuck with AES for decades.
Software level: Many neat implementation tricks have been proposed to protect AES, from altering or randomising the use of caches to bitslicing the encryption and eliminating caches. A software patch worked for RSA, where re-shuffling the pre-computed values of the message has been deployed to mitigate Percival’s hyperthreading attacks against bit-windowed exponentiation. Unfortunately, the cost of this approach is prohibitive for AES as re-shuffling the AES lookup tables between encryptions is many times more costly than encrypting with no tables at all, an approach whose performance got us into the reliance on table-lookup in the first place. Even worse, randomisation doesn’t prevent collision attacks.
OS level: The operating system, possibly with the assistance of new hardware-instructions, could close the cache side-channel by partitioning the cache between processes, locking critical sections of cache, or simply disabling shared cache or hyperthreading during sections of code marked as “security critical” by application programmers. The downsides here are multifold: this adds a lot of complexity for OS programmers, and the very transparency of caching becomes a problem-will the OS scheduling policy have to be changed for each minor cache design change in hardware, and will OS developer misunderstandings of hardware-specific cache behavior progress from edge-case performance loss to crypto vulnerability? Requiring a large number of people to understand the intricacies of caching behavior is almost certainly unrealistic (try giving section 10 of Intel’s manual a read). We think there’s a nice analogy to the number of bugs introduced by concurrent execution-if each one of these were potentially a security vulnerability, it would be trouble.
Cache implementation level: This is where Ruby Lee and her colleague’s proposal comes in. Perhaps we can exploit the fact that caches are just a performance optimisation which software shouldn’t rely on in detail, silently changing the caching behaviour to be randomised. This is nice in that action is required of relatively few people, but even assuming there is no performance penalty it is unsettling in that it makes caches even more complex, raising the possibility of future side-channels being found. The proposal also does nothing to prevent collision attacks.
Instruction set level: Intel has finally announced dedicated AES support with its AVX extensions, due out in 2010. To most people’s satisfaction, a hardware AES implementation eliminates cache vulnerabilities, but servers purchased today will likely be running for decades.

So, in the end we are left with many options and few good ones. We advocate in general that we should aim for a fix which requires the smallest number of people to make changes, and one which reduces complexity so as to prevent future vulnerabilities. By this metric the first and last options seem the best, but they also take the longest to implement, meaning for the short-term we’ll need to rely on hacks at the software and OS level, which means a lot of pain for crypto and operating system implementers. And while crypto algorithms are clear targets for cache attacks due to their iterated implementation which facilitates statistical attacks, we can’t rule out more general attacks in the future against other security-sensitive code.