There are many hash functions for many different circumstances. In this video, you’ll learn about MD5, SHA, RIPEMD, and HMAC hash functions.

<< Previous Video: WEP vs. WPA Next: Symmetric Encryption Ciphers >>

The MD5 algorithm was created by Ronald Rivest. He is one of the fathers of cryptography. He’s been doing this for quite some time. If you have an opportunity to go out to YouTube and look at some of the presentations he’s given, he really is one of the founders and brilliant thinkers in cryptography. The MD5 hash algorithm, itself, was published in April of 1992. As the name implies, MD5 comes after MD4. The MD5 message digest algorithm is a 128-bit hash value, so the information that you get once you hash something is 128 bits long.

In 1996, however, there was a discovery of a number of collisions that were found with MD5, and one of the things that you’ll notice as you examine hash algorithms is one of the biggest challenges they have is to make sure that there can’t be two separate pieces of information that end up creating the exact same hash. That’s called a collision, and in the world of hashing, that’s a bad idea. When they found these in 1996, it was a pretty bad set of vulnerabilities, and they realized this particular algorithm is not very resistant to these types of collisions.

In fact, in December of 2008, researchers created a certificate authority certificate– this is a pretty big deal– that looked absolutely legitimate when you did an MD5 hash against that certificate. And they were able to build other certificates– these are the kinds that might be used on web servers, for instance– that appeared to be completely legitimate and issued by a third-party provider of certificates. So someone, technically, could take that certificate, put it on their web server, and your browser thinks that that certificate is absolutely valid, and that is a very, very bad idea. That means that I could pretend to be Microsoft. I could pretend to be eBay. I could pretend to be anyone.

To give you a feel for what these collisions look like, these are two separate pieces of information. Everything in red is text that is different between them, but everything else is exactly the same. But clearly, those are different pieces of information, and unfortunately, the MD5 comes up with exactly the same hash. And that’s our collision right there. That’s what we’re trying to avoid. And right after this, turns out they ended up not using this particular method to create these certificates any longer. Rapid SSL was decided not to really release or provide any of those types of certificates any longer because of these vulnerabilities that were found in MD5.

Another common hash algorithm is the Secure Hash Algorithm or SHA. Some people say S-H-A. It’s one that was created in the United States by the National Security Agency, a government agency within the United States. It is also a Federal Information Processing Standard. So it’s one when the government creates these standards, they decide to roll it out across all of the federal agencies, and that’s the method that they use to provide certain hashes of their important information.

One that was widely used is SHA-1. This is a 160-bit digest, so a little bit bigger than the MD5 we were just looking at. Unfortunately, again a common problem with hashing algorithms, in 2005, there was a publication that talked about collision attacks that could occur with SHA-1. So the natural progression, then, is to create one that’s a little bit better, and SHA-2 was released. This is now the preferred variant of this SHA hash algorithm.

This is a bigger digest, 512 bits. The idea, usually, being that if it’s a longer number of bits, it may be more difficult to find collisions between the different hashes. SHA-1 is now retired for most US Government use. They’ve all been said, there’s problems with using SHA-1, collisions are there. Everybody please start moving all of your different applications, all the development that you’re doing, and all the products that you use to provide this hashing over to the more secure SHA-2 standard.

RACE-MD is an entire family of different hashing algorithms. It was created by RACE, and this RIPEMD stands for RACE Integrity Primitives Evaluation Message Digest. That’s a mouthful. RACE stands for the Research and Development in Advanced Communication Technologies in Europe. So this is a European agency that was really created so that there could be some standards around communications through all the different countries in Europe. This is a centralized standard. There is centralized management associated with the technologies that they’re creating. So this hashing algorithm, or sets of algorithms, was created just for this purpose.

The original version of this, the RIPEMD was found to have collision issues in 2004, and because of that, they’ve now moved to a RIPEMD-160, which, to this point, does not have any known collision issues associated with it. This is an interesting mix between MD4 from a design perspective, but it has similar performance characteristics to SHA-1. So there’s a nice balance there between the usability of this hash and the speed at which they’re able to use it. There’s also other standards out there, RIPEMD-128, RIPEMD-256, and RIPEMD-320, and obviously, the different hashes might be used for different things.

When you apply a hash algorithm to a file or a document or an email, you end up getting this nice little signature at the bottom. So all you really know is the document that you’ve received is exactly the same as the document that was sent, but you can’t really verify who sent the document. So this little technique, which is the Hash-based Message Authentication Code, or HMAC, is one where you take a secret key and you combine it with the hashing process so that on the other side, you can apply the same key to it and see if the person who sent it really was the person you were expecting, because only two of you would know what that key is.

This means that you’re not only able to verify that the data has not been change, but now, you know for sure who sent this data. It is absolutely verified just based on the hash. Again, we’re not changing anything with the text or the document or the original piece of information that was sent. You don’t need fancy, asymmetric encryption. This is a simple symmetric key. You’re using the same key on both sides to be able to determine this information.

This is commonly seen, actually, in IPsec. It’s commonly seen in TLS, which is the big brother now, the new version of SSL. And it’s a simple process to simply add this key to a very standard set of paddings and implement that within the message to create the hash. It actually is one where you have multiple passes to finally come up with what the final hash might be. So you reverse this process on the other side. You simply go through the same thing. If you end up getting exactly the same hash at the end, then you know the other side had that same secret key, and you can feel very good that the person who sent this is now verified.