Our data is becoming increasingly distributed, and our security strategies have changed to keep up with these dramatic changes. In this video, you’ll learn some best practices for protecting the data that week in the cloud and in our SAN.
The data controls that we apply to what we put into The Cloud are very similar to the same controls that we might use if all of the data was inside of our building. The difference, of course, is that in The Cloud there’s much more of a security concern, since now that data could possibly be accessed by anyone wherever they happen to be. So you of course need to apply things like security controls to that data. You want to make sure that the proper rights and permissions are applied not only by the applications that are accessing that information, but also by all the administrators that have to maintain and keep all of that data working out in The Cloud.
We should also think about how we’re storing this data. We commonly refer to this as “data at rest.” It is on a storage device. In this case, it’s in a storage device that’s in The Cloud. We have to think about encryption of this information to help keep it private. If we store this information in an encrypted form, then obviously if somebody did gain access to that data, they wouldn’t be able to read it. But of course, there’s computing overhead and additional administrative concerns whenever you start encrypting all of the data that you’re storing somewhere.
And of course, if all of this information is stored off site somewhere, we might want to add some additional security controls to it. We might want to put an additional firewall in front of all of this data, have different security access controls to that data, have intrusion prevention systems that can look for somebody trying to take advantage of vulnerabilities in those services that we’re providing. Since all of that data is somewhere outside of your direct control, it’s therefore at risk and you need to apply the appropriate security controls to make sure nobody comes across or has access to this information.
As the saying goes, the network is the computer. And since we also have all of this stored somewhere on the network, we could also say the network is the SAN. That Storage Area Network is gathering and keeping all of our information there for us and it’s accessible from wherever we happen to be. We do want to make sure that we physically secure the Storage Area Network. If somebody does gain physical access to our storage devices, they can not only grab information from those devices, they can obviously damage or in some way make those devices unavailable to us.
So we need to make sure they’re behind locked doors in a data center or some physically restrictive area– one that we can check and see who’s going in and who’s going out of that area. If you’ve ever been to a data center, there’s at a minimum a lock on the door and there’s generally additional security controls added to that particular lock.
Sometimes we might want to even consider having drives that will encrypt the data in hardware as it’s written to the drive and decrypt the information as it’s coming off of the drive. That way, if somebody did break in and they stole the drive itself, they would not have access to that information without the proper security pass phrases or the security certificates.
We also might want to consider how this information is stored if it leaves this protected area. Good example to this is when you’re transferring data outside of the data center– maybe to somewhere else in the organization across a wide area network or to a third party. You may want to encrypt that data as it’s going across the network.
One additional concern with transferring data or having data as it leaves the data center is with your backup tapes. These backup tapes are very often stored at a third party facility. Unfortunately, backup tapes can go missing as they are transferred between locations. So you want to be sure that if a tape is missing, that at least nobody would be able to gain any information from that tape because you’ve encrypted everything written to that tape.
We also have to think about how encrypting data may have an effect on the resources that we’re using. When you encrypt data, it doesn’t come for free. There’s additional CPU cycles. It may require additional traffic to go across the network. There’s some type of overhead of resources. A backup that took an hour may now take longer than an hour and is that something that we can tolerate given the amount of time that we have for backups?
All these things have to be considered when you’re taking this data outside of that trusted environment. And sometimes it even has to be considered if you’re keeping it in that trusted environment just so you can be sure that everything that we’re storing on these devices will be protected.
The concept and implementation of big data is completely changing how we think about the protection of data. This big data is obviously a huge massive dataset. We’re taking information from many diverse data sources and we’re storing it in one central place. That means that the normal access controls that we might usually apply to a certain type of known data may not apply to this big storage of data that we have.
You can usually fit a “need to know” principle to a traditional dataset. If you’re in the accounting department, you can access accounting information. If you are in the marketing department, you can access marketing information. But with big data, we may not even be completely sure exactly what type of data we’re storing.
That’s one of the objectives of big data is to store everything you have and later on, we will sift through the data to try to find relationships and build some intelligence from the data that we’ve got there. That’s the idea of hunting down patterns. And we don’t know what patterns we’re going to find until we go hunting for them. So now becomes difficult to qualify who gets access to the data and who doesn’t.
As we’re taking this information from all these very diverse data sources and we’re pulling it back into this big data repository, we may want to consider filtering out personally identifiable information or PII. This might be a social security number. It might be a telephone number. It may be anything that might be personal data that we might be concerned with somebody else getting their hands on. And if we can filter that out before putting it into the database, then we can perhaps relax our security controls because we know if somebody was to get access to that big data, they still would not have access to any of your personal details.
Once we have all of this data stored in this location, we may then want to think about what information people are pulling out of the data. The problem is that it becomes very difficult to audit this because a big data repository may have many different queries going on. And there may be a massive amount of data coming out of that data store.
So in those particular cases, it might make more sense to simply store the queries that are made to the database. Later on, if you wanted to perform an audit and see exactly what somebody was retrieving from the database, you could then simply perform exactly the same query and then see the exact response that person received.
This might also prompt you to implement some DLP into your environment– or Data Loss Prevention. A DLP device sits on the network and it watches for information to flow over the network. And if it notices information that should not be transferred– like social security numbers or credit card numbers or health care information– it can filter out, limit, or alert when those particular pieces of information are transferred across the network.
By utilizing some of these data security strategies for your Cloud data, your SAN, and your big data, you can be assured that the information that you’re storing and the information that you’re querying is going to be as secure as possible.