We use hashing and digital signatures extensively to ensure safe and secure data transfers. In this video, you’ll learn about hashing data, salting a hash, and creating a verifying digital signatures.
A hash is designed to take any type of input, whether that’s a document, some audio, a video file, a very large file, or a very small file and create a very specific unique string of text that’s associated with that file. This is sometimes referred to as a message digest, and you can think of it as a fingerprint. If you take a fingerprint and look at it, we know where that fingerprint came from. We know there’s only one thing that can be associated with that fingerprint and that no other person has that particular fingerprint. Just like you’re not able to recreate an entire person based on their fingerprint, you can’t recreate an entire file based on the hash.
This hash is a one-way trip. Once we take the hashing algorithm and apply it to some data, that hash provides us with some output that does not allow us to go the other direction and somehow recreate the data based on that hash. Because of that, it’s a perfect solution for storing passwords.
We can take a password, which really no one should have access to, create a hash of that password, and then refer to the hash every time we want to authenticate. This allows us to store the password securely. And if somebody did gain access to those hashed passwords, they wouldn’t have any idea what the original password was.
We can often use hashing to provide integrity. So we can take a hash of a file, we can copy that file somewhere else, we can perform that same hashing algorithm, and then compare the hash that we got on both ends. If the hashes are identical, then the file that we copied is exactly the same file as the original.
We also use hashing in combination with encryption to create digital signatures. That provides authentication, which tells us where this data came from, non-repudiation, which means we can tie it back to a specific person, and of course, the integrity feature is still part of that hashing algorithm. One unique aspect of a hash is that there is always a unique hash for a particular kind of input. You will never have two different kinds of input create the same hashing value. If you do find that two different inputs are creating the same hash in the output, then you’ve found a collision, and this is something you would not want to have with a hashing algorithm.
Let’s have a look at some hashes. We’ll perform a SHA256 hash, which is a 256-bit hashing algorithm. It outputs the equivalent of 64 hexadecimal characters. And we’ll take a single sentence. My name is Professor Messer with a period at the end. If we hash that single sentence, we get this entire value as the SHA256 hash.
Let’s create a hash of a similar input. We’ll have an input of My name is Professor Messer with an exclamation mark at the end of it. You’ll notice that the hashing value of this second input is very different than the hashing value we had on the original input. In fact, almost every single character of that hash is a different value even though the input for each of those hashing algorithms only was changed by one single character. This is a good example of the differences that you’ll see in a hashing output when using two different kinds of input.
We mentioned earlier that the only time we should see an identical hashing value is if the input values were identical. With a hash, we’re taking an input of any size, any type of file, and we’re creating a fixed size string from that that we’re calling a message digest, a checksum, or a fingerprint. Each one of these hash values should be unique if it is a unique type of input, and we should never have a hash appear that’s identical to another if there are different inputs involved. There have been cases, however, where a hashing algorithm did create the same hash for different types of input.
This occurred with the MD5 algorithm. They found this collision problem in 1996. This is one type of input that was used for MD5, and here’s the second input.
You’ll notice that these inputs are almost the same, but all of the characters marked in red are the differences between these two different inputs. If we hash the first input and hash the second input, we should get two different hash values. But with MD5, we got exactly the same hash value, and that is a collision, and it’s another reason why we usually don’t use MD5 to be able to perform a hash.
We mentioned earlier that it’s very common to use hashing to verify the integrity of information. For example, you may go out to an internet website and download a file, and along with the file name at that site, it may provide you with a list of hashes associated with these files. This means that you could download the file from their website and perform the same hashing algorithm to the file that you’ve downloaded.
If you then compare the hash that you created with the hash on the website and those two hashes are identical, then you have an exact duplicate of the file that was originally posted on that site. And as we mentioned earlier, it’s very common to use hashing when storing passwords. Instead of storing your password in plain text, we store your password as a hash, and we can create and compare that hash every time someone logs in.
Storing passwords is an interesting use case for hashing. It could be that you have hundreds of thousands of user accounts on a website, and statistically, it could be that multiple users happen to be using the exact same password. If you’re storing those hashes in a file, you could see that there are multiple users that are using exactly the same password. We don’t know what that password is, but we do know if we perform a brute force on that hash and we’re able to determine what that password is, then that same password will work for every single one of those identical hashes.
To be able to avoid this, we want to add some randomization to these hashes. We call this random value salt, and it allows us to add some randomization during the hashing process so that if even everyone was using the same password, every single one of the stored hashes would be very different. Each user might be using the same password, but each one of those passwords would have a different random salt associated with it and therefore, a different hash would be stored. This means that if we had already created some rainbow tables based on these hashes, and rainbow tables are pre-computed hash values, we would not be able to use those rainbow tables because this extra randomization of salt has been added to every single one of those hashes. If an attacker understands the process that’s used to add the salt, then they could still go through a brute force process to try to determine what that password is, but that is a much longer process to go through than looking up information in a rainbow table, and the goal here is to slow down the attacker as much as possible.
Let’s take a scenario where everyone is using exactly the same password, but we’re going to store that password information along with a salted hash. Let’s take our password. In this case, we’ll use the password of dragon, and the hash for dragon looks like this.
Let’s say that everyone is using the password of dragon, but for each account, we’re going to add an extra piece of salt to that, which of course is going to give us a different hash in every single one of those scenarios. So if an attacker was able to get a copy of our hash file, they would see what they thought were five different passwords. In reality, it’s the same password with the salt added for that randomization.
Another useful function of hashing can be found with digital signatures. Digital signatures allow us to send information to another party and have that person confirm that what they received is exactly the information that we originally sent. These digital signatures proved the source of the message and where this came from. We can verify that the digital signature isn’t fake, it hasn’t been modified, and it really came from that original person. And because the digital signature was made with the private key from the original user, we know that this document could not have come from someone else.
Because the digital signature is created with the private key, it’s verified with the public key. If I was to sign a document with my private key and send it to you, you would use my public key, which of course is available to everyone, to verify the contents of that message. This also confirms that it came from me because the only person who could have digitally signed this is the person with the private key, and I’m the only one who has my private key.
Let’s look at how a digital signature is created. We’ll take a scenario where Alice is hiring Bob, and she wants to send Bob a message that says, you’re hired, Bob. But she wants to make sure that Bob is able to verify that this message is legitimate and that it really came from Alice. Alice is going to perform a hashing algorithm on this entire plain text message, and out of that, we’ll get a hash of the plain text. The person receiving this message could look at the hash to at least verify the integrity of the message, but we don’t want somebody modifying anything in the middle of the conversation.
So the next step is for Alice to encrypt that hash that she created with her private key. And of course, the only person with Alice’s private key is Alice. The results of that are what we call a digital signature. And we can attach that digital signature to the original plain text and send that entire message to the recipient.
On the other side, we have Bob, who has received this message from Alice. The message says, you’re hired, Bob. And of course, we didn’t encrypt the message. We simply created a digital signature and attached that signature to the message. And of course, you can see at the bottom, there’s our digital signature.
To be able to verify this digital signature, Bob is going to reverse the process that Alice created. So the first thing he’ll do is use Alice’s public key to decrypt the digital signature that she sent, which of course is simply an encrypted version of the hash. After that decryption has been performed, Bob is left with the hash of the plain text.
Now he wants to perform exactly the same hashing function that Alice originally did, so he’ll take that plain text of you’re hired, Bob, run it through the same hashing algorithm that Alice did to see if he can get a hash of that plain text, and then compare those hashes to make sure that they are the same. And if those hashes match, then he knows that this message is legitimate. It really did come from Alice, and nothing has been changed with that message while this message was being sent to Bob.