Comments on: When Layers of Abstraction Don’t Get Along: The Difficulty of Fixing Cache Side-Channel Vulnerabilities

By: hard_crypto

hard_crypto — Sat, 14 Mar 2009 23:49:41 +0000

Padlock XCRYPT instruction has been around how many years longer than yet to get here AVX?

A few watts and you get 780,259.42kB/sec aes-256-cbc. 266,131.14kB/sec sha256. tens of Mbps of hw entropy. etc, etc.

Entropy, mont. mult, digests and ciphers should all be in hardware at instruction level in any self respecting core!

By: Robert N. M. Watson

Robert N. M. Watson — Mon, 23 Feb 2009 11:22:00 +0000

In fact, I’m not a huge fan of the cache-based sidechannel attacks because they require the attacker to be running code on the same system as the victim, in which case why doesn’t the attacker just escalate privilidge to root and just go from there. Its not that they aren’t real, but that there are so many weaker points to attack when your adversary is running code on the same processor as you are.

Hi Nicholas:

Interestingly, I heard this line of reasoning proposed by a CPU vendor while on a teleconference with a major ISP consumer of their CPUs discussing this problem. It became clear quite quickly that the vendor and ISP didn’t see eye-to-eye: while it’s economically convenient for CPU manufacturers to argue that you should buy one server for each customer, scalability in hosting environments relies on sharing, be it for reasons of performance, physical footprint, administrative cost, hardware cost, or power. Ssoftware security models in current OSes are known to have weaknesses, but virtual hosting ISPs will mitigate those weaknesses using a variety of technologies — hardening and sandboxing in various forms, OS-level virtualization, machine-level virtualization, and so on. None of these protect you against the cache attacks on crypto, but do quite effectively protect you against local-to-root exploits, which are decreasingly common attack vectors on hosting environments. If you are a virtual provider host then protecting the integrity of your service platform is fundamental to protecting individual customers and your own service provisioning system, and the idea that a customer might be able to extract the host’s SSH host key, or the SSL private key of a management component or another customer in another virtual OS instance is pretty worrying.

Robert

By: Clive Robinson

Clive Robinson — Sun, 22 Feb 2009 05:56:33 +0000

@ Joseph,

Side channels are as people are now recognising, an interesting example of “unintended consiquences”.

I guess they are arguably a “TEMPEST” attack against a system. And are a consiquence of trying to use the hardware in a cost effective or efficient manner.

One rule of TEMPEST design is “red and green channel segregation”. Normally it is thought of incorectly as just an electromagnetic or “cabeling/layout issue”. It’s not it’s more fundemental than that, it is an axiom of the design philosophy.

A number of “TEMPEST” rules of thumb for design are now fairly well known (energy/bandwidth). But the “clock your inputs and outputs” and “single function” or “function segregation” do not appear to be. Or are ignored for the sake of cost or efficiancy.

Ignoring these rules will almost always give opportunities for confidential information to be leaked via complex interaction within a device or system, be it deliberate or unintentional.

One of the most obvious of these side channels is “timing attacks” which is what we are basical talking about here (however there are several others as well).

In a single function unit generaly the point to apply the attack is at the input and the point to monitor it is the output.

That is if I vary the input timing I can cause the delays of various parts of the system to be seen modulated on the output.

However if the inputs and outputs are “clocked” then the available bandwidth of this attack is limited to the recipricol of the clock rate.

More importantly any time invalid input is more easily detected and the whole function should be aborted when such an error is detected thereby limiting the available information bandwidth to the attacker even further.

And importantly also raising an alarm to the posibility of an active attack.

However in a multi function device the number of points to both apply and monitor an attack go up extrodinarily.

But importantly as they are mainly within the device the potential for limiting the bandwidth or information leaked is effectivly removed. As is the ability to detect active attacks.

Which is why you have the single function or segregated function rules.

As designing effective segregation in a device is extreamly difficult at the best of times most designs opt for keeping it simple and over engineered.

In essence the design is single function only, and usually internaly further broken down into smaller segregated functional units. With aditional monitoring and alarms.

That is internaly the functional units are “pipe-lined” with all steps designed to be done well within a single clock cycle. With other associated circuitry at each stage on sensitive input or output lines to detect and deal with fault states.

This is just one of the reasons TEMPEST designed products are expensive.

However not only are they expensive from the costs of the segregation they are also functionaly inefficient as well adding further cost.

Whilst these costs might be acceptable to governmental departments invariably they are not to commercial organisations.

Which is why the use of software running on COTS technology is so seductive.

Effectivly you currently make an ordinary application of your crypto functions and it sits on top of a consumer grade operating system on consumer grade hardware.

The problem is that consumer grade technology is almost always designed to be efficient and absolutly no consideration is given for side channel segregation or other TEMPEST related issues.

Further the end user likewise wants to efficiently use their equipment so the crypto application is very likley to end up on a system with other ordinary user accessable applications via a shared high bandwidth communications channel (ethernet network).

Even if the crypto app is the only non OS application running on the machine the OS and ethernet network alow for timing attacks to be carried out.

Whilst none of these issues are insermountable they require a bottom up re-think of how to go about them.

In the short term I suspect it will be by “fuzzing” the timing by use of spread spectrum type techniques. In the longer term by incorperating appropriate changes in both the hardware and OS.

But neither of these will 100% solve the side channel issues on non segregated systems.

As noted above the obvious timing side channel is not the only one. And comercial entities are unlikley to want to give up the “efficiency” of running user accessable applications on the same system as the crypto.

So I suspect that whatever the solution it will not be either “sufficiently efficient” or “sufficiently secure” for software on ordinary COTS technology use.

Which raises the question of should crypto ever be certified for software use on unknown or insecure hardware?

All of which beings you around to the age old question,

“What price for security”.

By: Nicholas Weaver

Nicholas Weaver — Fri, 20 Feb 2009 13:49:33 +0000

I disagree on the first. Decertifying AES JUST in the case where your adversary is running on the same processor as you has the effect of causing HUGE issues for incompatibility, so its hardly the “smallest change”, but rather the largest, because this is the entire reason why the AES standardization did not pick two algorithms.

In fact, I’m not a huge fan of the cache-based sidechannel attacks because they require the attacker to be running code on the same system as the victim, in which case why doesn’t the attacker just escalate privilidge to root and just go from there. Its not that they aren’t real, but that there are so many weaker points to attack when your adversary is running code on the same processor as you are.

And by your “Minimum change”, the OS is probably the place to effect the change by your standard on legacy hardware, simply because the AES routines should be OS-supported anyway: If the OS is compromised, you’re unable to win anyway.

But by making the crypto an OS supported library, you get a chance to control interruptions and can just prefetch the cache when starting the routine, and then pin those blocks so they aren’t evicted until the group of blocks is done.

The only case where you have to prevent other processes is when your caches are too small for your implementation, in which case then you have to instead do the “no other processes during crypto” and wipe the cache state when done.

Thus it isn’t a matter of marking “Security critical” code, its a matter of providing an AES API which takes care of the vaguarities.