How can you tell if a digital file on a piece of storage media has been altered? In this video, you’ll learn about hashing, and I’ll demonstrate how to compare hashes with a file that has been altered.
<< Previous Video: Recording Time OffsetsNext: Taking Screenshots >>
One of the challenges with the digital technology is it’s very difficult to tell if your digital file has been changed. You can’t just pick up a disk or CD-ROM and look at it with the human eye and try to determine if something’s changed on that file between yesterday and today. It’s one of the challenges we have.
So of course we have some things we can do to check files. And the easiest thing we can do, and probably one of the most accepted, is doing something like taking a hash of the file. This allows us to essentially create a fingerprint of a file. In fact, you’ll see it referred to as a digital fingerprint. And if that file or any part of that file ever changes, you’ll notice that the fingerprint will change. We’re going to look at this in just a moment.
The normal, or one of the most common ways to do this, is with something called Message Digest 5. And you’ll see it always abbreviated as MD5. Oh, you need to generate an MD5 hash of that file. What’s the MD5 of that file? You may have also seen this if you’ve been downloading files from the internet, and they have a download link. And right next to the download link, they have a big long hex string, and they say, this is the MD5 of the file. The idea being is that the file was put on the website, and the fingerprint was made. When you download it, you can see if the fingerprint is the same as what it was on that web page. And at least you can see if those things sync up properly.
This MD5 hash is 128 bits long. As I mentioned, it is displayed as a hexadecimal string. There is an interesting chance of duplication– 1 in 2 to the 128th power. So we’re talking about 230 billion, billion, billion, billion of a chance that a change to a file would be exactly the same as the fingerprint that was made originally.
So one of the things we can say is, it’s pretty impossible, or relatively impossible, to have some modification to a file and have the fingerprint turn out to be exactly the same. In reality, you’re never going to hit this. The statistical odds are staggering.
Another type of hash is a CRC, or cyclical redundancy check. This is a much smaller type of check. It’s only 32 bits long. Again, it is displayed as hexadecimal. And you can see the chances here of having a CRC duplicated after a change. 1 in 2 to the 32nd second power, which is just over 4 billion to one.
Now one of the things that you’ll notice is that in your hardware, maybe hard drive checks and memory checks, they use CRC’s. And that’s a really good way to use that particular technology, because it can be calculated relatively fast, and it’s something that you can see very, very quickly as it goes by.
An MD5, because it is 128 bytes long, it takes a little bit longer. There’s a few more calculations that have to take place to be able to create that fingerprint. So you’ll see most of the time when we’re trying to verify that a file or an image is exactly the same as when we left it, an MD5 is really the one that is going to give us that flexibility.
You’ll very often see CRC’s when you’re looking at how hard drives are writing and checking their information. But rarely do we use a CRC from a security forensic standpoint. The idea is that we would check a file, an image, any digital piece of information, we can create an MD5 fingerprint on that. And that allows us to check it after the fact or verify it anytime during this file processing. We might take the file off of the computer, move it to another media. We might copy the image of a hard drive and move it somewhere else. We can check the fingerprint every step along the way.
So just like taking regular fingerprints at a crime scene, you will absolutely get a list of a bunch of fingerprints when you’re doing this type of incident response to make sure later on down the line– a day later, a week later, a year later– that you’ve got exactly the same file you started with. And your fingerprint is a very, very good way to do that.
Let’s see one of these hashes in action I’m running on my Mac desktop here, but you could be on Windows. There’s many utilities you can use to create MD5 on Windows, and LINUX, and other operating systems as well. I have a single file on my hard drive called evidence.txt. And in the Mac OS X, I can simply do MD5, and the name of the file, and it will give me this hexadecimal representation of evidence.txt.
So I know I can take that information, log it, and make sure that I have it, and that way if I want to check this file later, I can confirm that that evidence.txt MD5 fingerprint matches exactly what’s here. But what if somebody changes the file? Let’s change this file evidence.txt and say this file does not have important information. And let’s save that. If I run the same MD5 check now, notice the fingerprint’s very different. I made modifications to the file. And if you went back later and you said, wait a second, this evidence.txt MD5 hash does not match the one that was made originally, then you know this file is not the original file. Something has been modified. Now you have to figure out why something changed between the time when you grabbed the file and the time that you have it right now.