Big Data Analysis – CompTIA Security+ SY0-401: 2.4


The era of big data has arrived, and we’re now faced with analyzing more data than ever before. In this video, you’ll learn about big data analysis and the techniques that we’re using to sift through our huge data stores.

<< Previous Video: Chain of CustodyNext: Preparing for an Incident >>


The term big data is much more than what just the name implies. Obviously, we are talking about a lot of data. Thus the name big data. But the idea behind big data goes so much farther, beyond just a large collection of data. This data is something that we are collecting from many different places and we’re storing it We have such facilities these days to store large amounts of information. That we’re storing this data, we don’t even know if the data is correlated with each other in any way. And in fact, many of the tools that we use to analyze the data aren’t even able to take into account this massive amount of data. This is truly an emerging type of technology and one that will provide us with some interesting insights into information.

From a network security perspective when we think about big data we’re really thinking about collecting data from many different kinds of devices. These are things that you might traditionally think of like, firewalls or intrusion prevention systems. But certainly there’s information to be gathered from switches, and routers, perhaps, even file servers. What if we collected all of the internet searches that were done inside of your organization? What if every URL was categorized and stored? You can imagine the amount of data that you would collect, just over a single day, would be truly massive.

Not only do we have a challenge with collecting this massive amount of data in one place, but we also the challenge understanding of the different log types that we’re gathering. We’re gathering information from firewalls and that data is very different than what we might gather from a file server. And of course, all of that is different than what we might gather from internet search results. All of this data somehow has to be gathered together, and analyzed, and that’s where the real secret of big data comes in. We’re getting much better at storing this data. We’re now starting to have much better effect at going through the data and analyzing what we’ve collected. The data stores are so large, and the amount of data that we have to query is so diverse that we really can’t use their traditional forms of data querying that we’ve used in the past. We have to come up with new ways to be able to correlate this information together and provide some visualization of what exists inside that data store.

One of the most intriguing parts of big data is you really don’t know what you have until you start putting the data together and looking at it in different ways. Very often, correlations of data may appear that are completely unknown prior to that point. So it’s very important that the query tools that we have allow us to view the data in different ways. Maybe we’d like to see graphs of correlations between data. Perhaps, perform some statistical analysis that might lead us down a particular path, or perhaps things like tag clouds. They can take certain correlations and make them much larger, and have them pop more towards the top. This type of analysis is going to allow security professionals to look at information across the entire enterprise in ways that they never have before.