Log Management – SY0-601 CompTIA Security+ : 4.3

Security monitoring processes create extensive logs and data. In this video, you’ll learn about transferring, storing, and reporting on logs created from journalctl, metadata, NetFlow, IPFIX, sFlow, protocol analyzers, and more.


One of the standard methods for transferring log files from one device to a centralized database is called Syslog. This is a standard that you’ll find across many different devices, and if you’re installing a firewall or switch or a server, you’ll notice that there will be Syslog options within those devices so that you could send that data to a SIEM.

This is a Security Information and Event Manager and it’s usually a centralized log server that consolidates logs from all of these different devices. When we send information via Syslog, we’re labeling each log entry into this Syslog destination. There will be a facility code, which is the program effectively that created the log, and there will be a severity level associated with that log entry.

If you look in a Linux device or device that doesn’t automatically have Syslog functionality, you may see different daemons available, such as rsyslog, that is the rocket-fast system for log processing, you could see syslog-ng which is a popular syslog daemon for Linux devices, or NXLog, which is able to collect log information from many different devices and consolidate it on a single machine.

This is the front-end of a security information and event manager, that has received these log files via syslog, pass those files, and now we can view this information, we can search through this data, and create reports on everything stored in our database.

If you’re managing a Linux operating system, there are many different logs available on that device. Some of them are specific to the operating system itself, some of the logs are created by the daemons that are running on that system or the applications that you’re using.

There is a standard format for storing system logs on Linux in a special binary format. This optimize the storage area and allows you to query the information very quickly. But you’re not able to see it with a text editor because it’s in the special binary format.

Fortunately, Linux has a utility called journalctl, which allows you to query the information that’s in that system journal and provide output on what may be contained in there. And you can search and filter on those details, or view it as plain text.

This is a view of the output from the journalctl and you can get an idea of information about connections to SSD, here’s in rsyslogd entry, where syslog information has been received, and you can look at other details about authentications that have either failed or not failed on the system.

One of the first statistics we often want to gather from these log files are information on the bandwidth that we happen to be using. This is a fundamental network statistic and one that is almost universal no matter what device you’re connecting to. This shows you the percentage of the network that has been used over time.

And there are many different ways to gather this metric. You might use SNMP, the Simple Network Management Protocol, or you could use other more advanced capabilities such as NetFlow, sFlow, or IPFIX. You could also use protocol analyzers or software agents that might be running on a particular device.

Bandwidth monitoring is always a good first step. It’s good to qualify that you have the bandwidth available to transfer information for that application, because if the bandwidth has been exceeded and you’re running out of available space on the network, then none of your applications are going to perform very well.

Another great source for data that is in some ways hidden from us usually is metadata. Metadata is data that describes other types of data, and usually, metadata is contained within the files that we’re using on our devices. For example, if you send or receive an email, there is metadata within that email message that normally you don’t see.

There’s information in the headers of that email, that header information may show you, which servers were used to transfer that email from point A to point B, and you might want to be able to see destination information as part of that header in the email as well.

If you’re using your mobile phone, there’s an extensive amount of metadata that could be stored. For example, if you take a picture or store video on your mobile device, it could keep in that metadata the type of phone that was used to take that picture or the GPS location where the picture was made.

If you’re using a web browser to connect to a web server, then there’s metadata that’s transferred back and forth there as well. For example, you could be sending your operating system information, the type of browser that you’re using, and the IP address that you’re sending it from.

And if you look into documents or files that you store, for example, in Microsoft Office, you may find metadata inside that document that shows your name, your address, your phone number, your title, and other identifying details. Here are the headers that you normally don’t see when you’re looking at your email messages, and the metadata that’s hidden inside.

You can see what IP address a message was received from, and who received that message, you can see the return path, another IP address where that message was received, and other details that help you understand what path this to go through the network, what validations were used to be able to confirm this message was really sent by that originator, and other details about this email message.

NetFlow is one of these standardized methods of gathering network statistics from switches, routers, and other devices on your network. This NetFlow information is usually consolidated onto a central NetFlow server, and we’re able to view information across all of these devices on a single management console.

NetFlow itself is a very well-established standard, so that makes it very easy to collect information from devices that are made from many different manufacturers, but bring all of that information back to one central NetFlow server.

This is an architecture that separates the probe from the collector. So we have devices on our network that may be individual NetFlow probes, or the NetFlow capability may be built into the network devices that were using. These probes are either sitting in line with our network traffic, or they’re receiving a copy of the network traffic, and all of those details are exported to a central NetFlow collector where you can then create different reports.

There are usually extensive reporting options available on the collector, and we can gather very long-term information to be able to see trends and other details about how our network is performing. Here’s a NetFlow collector front end that shows the top 10 conversations and top 10 endpoints on our network, and it shows it as it’s relating to bandwidth.

We can also get a breakdown of all individual hosts here as well. And here’s another summary of details that shows the top five applications running on our network, and what the top NetFlow sources might be.

A similar data flow standard is IPFIX. This is the IP flow information export which you can think of as a newer version of NetFlow. It was one that was created and based on NetFlow version nine. This allows us with some flexibility over what data we would collect and what information would be reported to a centralized server. This is very similar to NetFlow, except we can customize exactly what kind of data we’d like to receive from those collectors.

One of the challenges with collecting this network traffic and creating metrics based on the conversations occurring on our network is that it can take a lot of resources, especially, if you’re running a very high-speed network. To be able to balance the available resources with the need to view more statistics on the network, this is sFlow or sampled flow, where we’re looking at a portion of the network traffic to gather metrics on.

Because of the lower resources required for sFlow, we can embed this capability in a number of our infrastructure devices. So the switches and routers that you’re already using on your network, may already support sFlow functionality. And although we’re only looking at a portion of the traffic going through, we can infer some relatively accurate statistics from these sFlow devices.

You may be able, for example, to view video streaming and high-traffic applications, by simply sampling a portion of that traffic as the flow is active. Here’s an example of some of the statistics you can gather using sFlow. You can see the top 10 interfaces by a percentage of utilization, we have top 10 interfaces by total amount of traffic, top 10 wireless clients by traffic, top 10 wireless access points by client count, and other statistics as well.

And if you need to get detailed information of exactly what’s going over your network, then you can use a protocol analyzer. Protocol analyzers are commonly used to troubleshoot complex application problems, because they gather every bit and bite from the network, and provide a breakdown of exactly what’s going across those particular network links.

You can also use this on wireless networks or wide area networks as well. You’re able to see information such as unknown traffic, that may be going across the network, you can provide packet filtering so that you can view exactly the information you’re looking for, and of course, the protocol decodes on the analyzer will give you a plain English breakdown of exactly what traffic is traversing the network.

Here’s a screenshot from an analyzer. On the top, we can see a packet by packet breakdown of delta times source IP addresses, source port numbers, destination IP addresses, destination port numbers, protocols, the length of data in the packet, information about the packet, and then underneath all of that, we have a detailed breakdown of every single packet that’s going through the network.