Protecting Data – SY0-601 CompTIA Security+ : 2.1

There are many different ways to protect our application data. In this video, you’ll learn about data sovereignty, data masking, encryption, tokenization, and more.

<< Previous Video: Configuration Management Next: Data Loss Prevention >>


An organization’s data is one of the most important assets they have, so it’s up to you as an IT security professional to make sure that data is safe. One of the challenges with approaching this problem of protecting data is that the data is in so many different locations. It could be on a storage drive, it can be traversing the network or it may be in the memory or CPU of a system.

We also have to think about other technologies that we can combine with this data to protect it. For example, we can include encryption or set up specific security policies. And of course, we may want to have different permissions to this data depending on where you work in the organization, which would be another way to help protect the data.

Our communications to practically every country in the world are available through the internet. So we have to have rules that determine how data is protected in each of these different territories. This falls under the category of data sovereignty, where we need to understand the laws associated with data depending on where that data happens to geographically reside.

Some of the rules around data sovereignty are specific to these locations. For example, the GDPR, which is the General Data Protection Regulation is a set of rules in the European Union. These rules specify that if data is collected on EU citizens, that data must be stored in the European Union.

The GDPR regulations are extensive and complex so it’s important that you understand the scope of those if you’re planning to collect this type of data on your network. And you may find that this is not the only case where the data you’re collecting has to reside in that same country. There are many other countries with similar rules, so make sure you understand those if you’re collecting and storing this information.

Another way to protect data is by masking the data. This is a way to obfuscate data that makes it more difficult to read. Data masking can look like this bank card number on the receipt where most of the bank card has been masked out by asterisks. The only thing you would be able to read are the last four digits of the number. We can, of course, use this data masking with any type of information so this might help protect personal identification information or PII, and any other sensitive data as well.

As we see this information stored on something like a receipt, we see these asterisks. But the data itself– including the full credit card number– may be stored on other servers within the organization. It’s just that we don’t have rights to be able to see that number so any time we would print it or create a receipt, the number is masked automatically.

There’s many different techniques you might see used for masking this type of information. It might be encrypting some of the data. It might be moving the numbers around so they’re not in the same order, or it may be substituting with different information completely.

Another way to protect data is by encrypting that information. Whenever we have the original information that we’re planning to encrypt, we refer to that as text and the information that we created after the encryption process is ciphertext. This encryption process allows us to go back and forth between text and ciphertext as long as we have the proper key and the proper process for encrypting and decrypting that data.

One of the important characteristics of data encryption is the concept of confusion. Confusion means that the information that we are encrypting and putting into the ciphertext is very different than what we originally started with the plaintext. Let’s take for example, a very simple bit of plaintext, Hello, world. And I’m going to encrypt that phrase into a PGP-encrypted message.

This is the message that I created that was encrypted from the original plaintext, Hello, world. You can see that the original plaintext and the resulting ciphertext, are dramatically different. And so this would qualify as having confusion as part of the encryption mechanism.

Another important characteristic of data encryption is the concept of diffusion. Diffusion means that if you change one piece of information in the plaintext, that the resulting ciphertext is going to be dramatically different between these different versions. For example, let’s take that original Hello, world plaintext and the PGP-encrypted ciphertext.

Let’s change this Hello, world plaintext to be Hello, world with an exclamation mark. You will see that the resulting ciphertext only has a similarity at the very beginning, which is the PGP header. And everything else in this particular encrypted message is dramatically different than the one that was with the original ciphertext. All we changed was one character in the plane text, but thanks to diffusion we’re able to get a very different result with our ciphertext.

If you’re protecting data you need to know where this data is located. And if you have this data on a storage device, we refer to this as being data at rest. So this could be a hard drive, an SSD, NMVME, M.2 drive. It doesn’t matter where we’re storing it, as long as it’s in a file that’s on that storage device, it’s data at rest. To be able to protect data at rest, then we need to encrypt the data on the drive.

So this might include whole disk encryption. It might include encryption built into the database. Or we might individually encrypt files or folders in that storage device. We might also want to assign permissions to the data that’s stored in this drive. So we might have a particular file or folder that’s only accessible by certain users or certain groups of users.

Data that’s moving across the network is referred to as data in transit. You might hear this also referred to as data in motion. This means that we have data that’s going between switch interfaces, router connections, and the devices that are on the network themselves. There are many different types of devices sending this information back and forth and every step along the way, we have some data that’s in motion.

We will often allow or permit access to this data using a firewall or intrusion prevention system. And of course, we might encrypt the data as it’s going through the network. Which means no one would be able to view that information and understand anything going by, because it’s being sent in the ciphertext form.

If you’re using TLS, which is transport layer security the newer name of SSL, or IPsec which is internet protocol security, then you are encrypting this data as it’s in transit. And if we have this data that’s in the memory of our systems, we consider that to be data in use. This would be data that’s in our system RAM, our CPU registers, or the caches that are on our system.

This data is almost always provided in a decrypted or plaintext view. That’s because it’s much easier to perform calculations and be able to read this information if it’s not encrypted. And because this data is in memory and not in an encrypted form, it becomes a very useful place for attackers to focus their efforts to be able to gather this data.

An example of attackers going after this data in use, was with the Target network in November of 2013. Over 100 million credit card numbers were stolen, because the attackers put malware on every single checkout register. So even though we were sending information across the network in encrypted form, the attackers were focusing on the credit card numbers that were in the memory of those checkout registers and they were able to look at the data in use on every point-of-sale terminal.

Another way to protect data is to show a completely different data than what was originally there. We call this tokenization. And it’s when we take sensitive data and replace it with a completely different set of data. Here’s an example of tokenization that is a Social Security number 266-12-1112. Instead of storing that information in our database as that Social Security number, we’ll now store it as a completely different number.

If someone does gain access to this information and tries to use that as a Social Security number, they’ll find out very quickly that that number is not valid. This is very commonly used with credit card transactions. If you’re paying for something with your phone or your watch, it is using tokenization to transmit a token of your credit card information and not your credit card number directly.

This is not a hashing mechanism and we’re not encrypting the data either. We’re simply replacing one set of numbers or characters with another set of numbers or characters. There’s no additional encryption or computational overhead on any device using this tokenization. Here’s how this tokenization process starts.

It begins with the phone that we’re using where we register our credit card on our mobile phone. And we have a credit card number with four and a bunch of ones and 1 2 3 4 at the bottom. Once we register that with the token service server, it provides us with a token instead of a credit card number and that token is stored on our mobile device.

We’ll now take our phone or our watch to the store and we’ll pay for everything at checkout using near field communication. That is going to communicate to the merchants payment server, which will then communicate back to the tokens service server to check the token that we provided during the checkout process. That will confirm that that is a valid token for a valid credit card number, and it informs the merchant that that particular token is valid and that transaction is approved.

If you’ve ever worked with a Microsoft Office document that didn’t allow you to modify parts of the page, or you worked with a PDF that wouldn’t allow you to copy or paste, then you’ve seen information rights management or IRM. IRM is used to prevent certain things from occurring within document. For example, you can prevent copying and pasting, controlling screenshots, managing the printing process, or restricting people from making changes to the document itself. The goal with IRM is to limit the scope of what people can do with this document. So if an attacker does gain access to someone’s workstation, they would only be able to manipulate that document from the perspective of that user’s rights and permissions.