A cryptographic hash function reading guide

2007-11-23CryptologyGeorge Danezis

After a few years of spectacular advances in breaking cryptographic hash function NIST has announced a competition to determine the next Secure Hash Algorithm, SHA-3. SHA-0 is considered broken, SHA-1 is still secure but no one knows for how long, and the SHA-2 family are desperately slow. (Do not even think about using MD5, or MD4 for which Prof. Wang can find collisions by hand, but RIPEMD-160 still stands.) Cryptographers are ecstatic about this development: as if they were a bit bored since the last NIST AES competition and depressed by the prospect of not having to design another significant block cipher for the next few years.

The rest of us should expect the next four years to be filled with news, first about advances in the design, then advances in the attacks against Hash functions, as teams with candidate hash algorithms will bitterly try to find flaws in each other’s proposals to ensure that their function becomes SHA-3. To fully appreciate the details of this competition, some of us may want a quick refresher on how to build secure hash function.

Here is a list of on-line resources for catching up with the state of the art:

A very quick overview of hash functions and their applications is provided by Ilya Mironov. This is very introductory material, and does not go into the deeper details of what makes these functions secure, or how to break them.
Chapter 9 on Hash Functions and Data Integrity of the Handbook of Applied Cryptography (Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone) provides a very good first overview of the properties expected from collision resistant hash function. It also presents the basic constructions for such functions from block ciphers (too slow for SHA-3), as well as from dedicated compression functions. Chapter 3 also quickly presents Floyd’s cycle finding algorithm to find collisions with negligible storage requirements.
If your curiosity has not been satisfied, the second stop is Prof. Bart Preneel’s thesis entitled “Analysis and Design of Cryptographic Hash Functions“. This work provides a very good overview of the state of the art in hash function design up to the middle of the nineties (before SHA-1 was commissioned.) The back to the basics approach is very instructive, and frankly the thesis could be entitled “everything you wanted to know about hash functions and never dared ask.” Bart is one of the authors of RIPEMD-160 that is still considered secure, an algorithm worth studying.
Hash functions do look like block ciphers under the hood, and an obvious idea might be to adapt aspects of AES and turn it into such a function. Whirlpool does exactly this, and is worth reading about. One of its authors, Paulo Barreto, also maintains a very thorough bibliography of hash function proposals along with all known cryptanalytic results against them (and a cute health status indicating their security.)
Prof. Wang’s attacks that forced NIST to look for better functions are a must-read, even though they get very technical very soon. A gentler introduction to these attacks is provided in Martin Schlaffer’s Master’s thesis describing how the attacks are applied to MD4.
Finally it is no fun observing a game without knowing the rules: the NIST SHA-3 requirements provide detailed descriptions of what the algorithm should look like, as well as the families of attacks it should resist. After reading it you might even be tempted to submit your own candidate!

9 thoughts on “A cryptographic hash function reading guide”

Lurker says:

2007-11-24 at 00:04 UTC

Greetings

Thanks for the great Blog.

One strong suggestion:

Please make the name of the Web page be the title of the posting.

As it is, when you bookmark a posting, the name of the bookmark is the name of your blog. Not very useful.

Thanks and keep up the great work!

RayK

Reply
Steven J. Murdoch says:

2007-11-24 at 00:32 UTC

@Lurker

The blog name being at the front is the WordPress default, and is not a simple option to change. I have removed the “Blog Archive” part though, so hopefully this will fit in the bookmark field more easily.

Reply
required says:

2007-11-24 at 17:34 UTC

JFYI, if you use lynx browser, text only, click on article here, hit p, works as wanted.
Lynx works good for logging.

Reply
JackPT says:

2007-11-27 at 16:15 UTC

Thanks for posting this, it is useful and interesting.

Reply
Francis Turner says:

2007-11-28 at 12:46 UTC

A possibly dumb question.

If you hash your content using two different but breakable funtions (say MD5 and RipeMD128 or even MD5 plus a basic checksum function) would you not end up with some thing that is a lot harder to create a collision for?

My assumption is that because you have to create a collision that works for both functions and the algorithms are sufficiently different that this is unlikely to occur without a lot of work even if both algorithms are easy to crack.

Reply
Francis Turner says:

2007-11-28 at 13:21 UTC

Follow up from 5.

It occurs to me I may nto have been totally clear. What I mean is providing the outputs of both insecure hashes as a combined check. I.e. if you have a 2 128 bit hashes (e.g. MD5 and RipeMD) then the output hash is the 256 bit number which is the concatenation of the two hshes.

Also wouldn’t Hashing using the same crackable algorithm but with two different salts be another way to make it harder to create collisions? So again you get a 256 bit number which is MD5(msg plus salt1).MD5(msg plus salt2).

Reply
Richard Kelsall says:

2007-11-29 at 15:45 UTC

@ Francis

A jolly interesting question. You have illuminated some essential properties of hashes.

A hash may be applied to a very large file that won’t fit in memory and will normally process the whole file in one pass. You have potentially doubled the file access effort, which may be the expensive operation, by hashing the whole file twice. However, we could re-write the two hash functions to interleave as the file data becomes available.

A hash needs to be as short as possible because we may store and transfer lots of them and we don’t want to waste effort with longer than necessary hashes. Your first scheme without the salts will add needless length to the hash. It would be better to XOR the hashes onto each other, assuming this gives sufficient length and the hash algorithms are so distinct that they cannot combine under certain circumstances to create a weak combined hash. If you are going to do two passes over the file I think it would be better to do it like this :

Hash2( Hash1( message ) + message )

Given diverse algorithms for Hash1 and Hash2 this looks like a good scheme to me.

Your scheme with salts adds length to the hash for each salt. Though liberal salt is normally added to the stew a hash is a separate ingredient that the chefs like to mix in for themselves. And the order you have expressed it is weak :

Hash1( message + salt1 ) + Hash1( message + salt2 ) + salt1 + salt2

allows almost the entire hash calculation to be performed once rather than twice as this would :

Hash1( salt1 + message ) + Hash1( salt2 + message ) + salt1 + salt2

Richard.

Reply
Blaufish says:

2007-12-09 at 20:34 UTC

@ Francis

In addition to Richard’s response,
Lets say you have a H3(m) = H1(m) + H2(m)
(concatenation)

If you want to do a collision search on H3, you first collision search upon the weakest hash, let’s say H1. Now you have reduced the search space from all possible hashes to the significantly smaller space where H1 collides.

Hence, finding a collision for H3 is relatively easy IF: you have an attack against H1 which yields a very large number of collisions, and a H2 collision also exists in this space.

And if H1 and H2 happens to be the same family of hash-algorithms, doesn’t stand to reason that they might share some collisions properties? (don’t ask me to prove it, way beyond my level of knowledge, it is just a “gut feeling”)

I have a feeling I’ve read something about this, hmm… could it have been in one of Shneier’s publications perhaps? I used to read that stuff some years ago.

@Richard:

This suggestion:
Hash2( Hash1( message ) + message )

Although might be suitable for an application/protocol (i.e. password hashing), but is not a desirable in an hash algorithm. It would mean a double pass over the message, which will not work for very large message / real-time requirements.

But that unwanted property we can change if we tweak your suggestion slightly

Hash2( message + Hash1( message ) )

With that little tweak, I don’t see any apparent flaws in it. But I’m complete amateur 🙂

Reply
Richard Kelsall says:

2007-12-10 at 14:14 UTC

Hello Blaufish,

What you say looks about right to me, but I’m just an amateur too, I’m making this up as I go along.

The only thing I would add is that although, as you say, it is more expensive, because it always requires two passes, this hash

Hash2( Hash1( message ) + message )

is more secure than this hash

Hash2( message + Hash1( message ) )

An easy explanation might go like this.

You are taken to a clearing in a dense forest, blind-folded and walked in a series of 300 random steps into the forest. The blind-fold is removed and you have to find your way back to the clearing. This is like breaking Hash1 on its own. Let’s suppose Hash2 is more difficult it gives 800 random steps into the forest. Putting the hashes together like this

Hash2( Hash1( message ) + message )

is like starting your 800 random steps from the point where you finished the 300 steps then finding your way back to the clearing. And putting the hashes together like this

Hash2( message + Hash1( message ) )

is more like walking 800 random steps from the clearing then a few steps towards where you would have got to after 300 random steps from the clearing. Much easier to get back to the clearing – break the hash.

I know the last part of the story isn’t quite right, maybe somebody clever can think of a better ending.

Richard.

Reply