All posts by George Danezis

A cryptographic hash function reading guide

After a few years of spectacular advances in breaking cryptographic hash function NIST has announced a competition to determine the next Secure Hash Algorithm, SHA-3. SHA-0 is considered broken, SHA-1 is still secure but no one knows for how long, and the SHA-2 family are desperately slow. (Do not even think about using MD5, or MD4 for which Prof. Wang can find collisions by hand, but RIPEMD-160 still stands.) Cryptographers are ecstatic about this development: as if they were a bit bored since the last NIST AES competition and depressed by the prospect of not having to design another significant block cipher for the next few years.

The rest of us should expect the next four years to be filled with news, first about advances in the design, then advances in the attacks against Hash functions, as teams with candidate hash algorithms will bitterly try to find flaws in each other’s proposals to ensure that their function becomes SHA-3. To fully appreciate the details of this competition, some of us may want a quick refresher on how to build secure hash function.

Here is a list of on-line resources for catching up with the state of the art:

  1. A very quick overview of hash functions and their applications is provided by Ilya Mironov. This is very introductory material, and does not go into the deeper details of what makes these functions secure, or how to break them.
  2. Chapter 9 on Hash Functions and Data Integrity of the Handbook of Applied Cryptography (Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone) provides a very good first overview of the properties expected from collision resistant hash function. It also presents the basic constructions for such functions from block ciphers (too slow for SHA-3), as well as from dedicated compression functions. Chapter 3 also quickly presents Floyd’s cycle finding algorithm to find collisions with negligible storage requirements.
  3. If your curiosity has not been satisfied, the second stop is Prof. Bart Preneel’s thesis entitled “Analysis and Design of Cryptographic Hash Functions“. This work provides a very good overview of the state of the art in hash function design up to the middle of the nineties (before SHA-1 was commissioned.) The back to the basics approach is very instructive, and frankly the thesis could be entitled “everything you wanted to know about hash functions and never dared ask.” Bart is one of the authors of RIPEMD-160 that is still considered secure, an algorithm worth studying.
  4. Hash functions do look like block ciphers under the hood, and an obvious idea might be to adapt aspects of AES and turn it into such a function. Whirlpool does exactly this, and is worth reading about. One of its authors, Paulo Barreto, also maintains a very thorough bibliography of hash function proposals along with all known cryptanalytic results against them (and a cute health status indicating their security.)
  5. Prof. Wang’s attacks that forced NIST to look for better functions are a must-read, even though they get very technical very soon. A gentler introduction to these attacks is provided in Martin Schlaffer’s Master’s thesis describing how the attacks are applied to MD4.
  6. Finally it is no fun observing a game without knowing the rules: the NIST SHA-3 requirements provide detailed descriptions of what the algorithm should look like, as well as the families of attacks it should resist. After reading it you might even be tempted to submit your own candidate!

What is the unit of amplification for DoS?

Roger Needham’s work has warned us of the potential damage of Denial-of-Service attacks. Since, protocol designers try to minimize the storage committed to unauthenticated partners, as well as prevent ‘amplification effects’ that could help DoS adversaries: a single unit of network communication should only lead to at most one unit of network communication. This way an adversary cannot use network services to amplify his or her ability to flood other nodes. The key question that arises is: what is the unit of network communication?

My favorite authentication protocol, that incorporates state-of-the-art DoS prevention features, is JFKr (Just Fast Keying with responder anonymity.) An RFC of it is also available for those keen to implement it. JFKr implements a signed ephemeral Diffie-Hellman exchange, with DoS prevention cookies being used by the responder to thwart storage exhaustion DoS attacks directed against him.

The protocol goes a bit like this (full details in section 2.3 of the RFC):

Message 1, I->R: Ni, g^i

Message 2, R->I: Ni, Nr, g^r, GRPINFOr, HMAC{HKr}(Ni, Nr, g^r)

Message 3, I->R: Ni, Nr, g^i, g^r, HMAC{HKr}(Ni, Nr, g^r),
E{Ke}(IDi, sa, SIG{i}(Ni, Nr, g^i, g^r, GRPINFO)),
HMAC{Ka}(‘I’, E{Ke}(IDi, sa, SIG{i}(Ni, Nr, g^i, g^r, GRPINFO)))

Message 4, R->I: E{Ke}(IDr, sa’, SIG{r}(Ni, Nr, g^i, g^r)),
HMAC{Ka}(‘R’, E{Ke}(IDr, sa’, SIG{r}(Ni, Nr, g^i, g^r)))

Note that after message 2. is sent, ‘R’ (the responder) does not need to store anything, since all the data necessary to perform authentication will be sent back by ‘I’ in Message 3. One is also inclined to think that there is no amplification effect that ‘I’ could benefit from for DoS, since the single message 1. generates a single reply i.e. message 2. Is this the case?

We are implementing (at K.U.Leuven) a traffic analysis prevention system, where all data is transported in UDP packets of fixed length 140 bytes. As it happens message 1. is slightly shorter than message 2., since it only carries a single nonce and no cookie. (The JFKi version of the protocol has a message 2. that is even longer.) This means that we have to split the reply in message 2. over two packets carrying 280 bytes in total.

A vanilla implementation of JFKr would allow for the following DoS attack. The adversary sends many message 1. UDP packets pretending to be initiating an authentication session with an unsuspecting server ‘R’ using JFKr. Furthermore the adversary forges the source IP of the UDP packets to make them appear as if they are sent by the victim’s IP address. This costs the adversary a UDP packet of 140 bytes. The server ‘R’ follows the protocol and replies with two UDP packets of 140 bytes each, sent to the victim’s IP address. The adversary can of course perform this attack with many sessions or servers in parallel, and end up with double the number of packets and data being sent to the victim.

What went wrong here? The assumption that one message (1.) is replied with one message (2.) is not sufficient (in our case, but also in general) to guarantee the adversary cannot amplify a DoS attack. One has to count the actual bits and bytes on the wire before making this judgment.

How do we fix this? It is really easy: The responder ‘R’ requires the first message (1.) to be at least as long (in bytes) as the reply message (2.) This can be done by appending a Nonce (J) at the end of the message.

Message 1, I->R: Ni, g^i, J

This forces the attacker to spend to send, byte for byte, as much traffic as will hit the victim’s computer.

Traffic Data Retention and Forensic Imaging

Last week I participated in yet another workshop on traffic data retention, in ICRI, with the added twist that now traffic data retention is in ‘European Law’, and shall become actual law in most EU countries very soon. It was a special treat to be talking just after Chief Superindendent Luc Beirens, Head of the Belgian Federal Computer Crime Unit, that tried to sell the idea of retention to a crowd of people from the flagship EU privacy project PRIME.

As usually Beirens assured us that proper judicial oversight exists and will regulate access to traffic data. Yet a different pictured emerged when we got into the details of how cyber-crime investigations are conducted. It turns out that the first thing that the police does, to the suspects but also the victims, of cyber-crime is to take a forensic image of their hard disk. This is a sound precaution: booting up the machine to extract evidence may activate malware on a victim’s machine to erase traces, or an alert system on a suspects computer.

The obvious question becomes: how does this policy of automatic forensic imaging and analysis of a hard disk interacts with traffic data retention? Luc was keen to acknowledge that the investigation procedure would proceed unchanged, and an image of a hard disk that may contain retained data would be taken — and forensic tools used on the totality of the hard disk. To be fair, tools that take a forensic image or only look at parts of the disk according to a set security policy do not exist.

What does this mean? If you are a victim of cyber-crime, or a company you have given your data to is a victim of cyber-crime, all the data will end up with the police. This will be the case irrespective of judicial oversight, or any other safeguards. You may ask yourself what the chance is that the retained data will be kept of a computer that maybe part of an investigation? First do not underestimate the fact that these machines will end up on-line to serve requests, and therefore will be subject to their fair share of attacks. But most importantly this case will obviously occur as part of an investigation on themisuse, unauthorized access, or attempted access to the traffic data retention systems!

This standard procedure may also explain why companies are so reluctant to call in the high tech crime units to help them investigate cyber-crime. Their procedures are simply incompatible with any security policy with a confidentiality component. Would you report some of your documents being stolen from your home or business, if this meant the police taking a copy of every single paper in the building?