Capturing digital data is a series of technical challenges. In this video, you’ll learn about capturing data from disk, RAM, swap files, operating systems, firmware, and other sources.
One challenge you have when collecting data from a system is that some of the data is more volatile than others. That means that certain data will be stored on the system for an extended period of time, while other pieces of data may only be here for a few moments. Therefore we need to start collecting data with the information that is the most volatile and then we’ll work down to the data that is the least volatile.
Data that is very volatile is data that’s in your CPU. So things like your CPU registers or CPU cache should be the very first thing you gather. Secondly, would be information that would be around for a little bit longer than CPU information, but not much longer. Things like router tables, ARP cache, process tables, and information and memory will probably be the second most volatile.
From there, we can start to look at files that might be stored on our system. Temporary file systems would be the next on our list, followed with other information that’s normally stored on a drive. If any of the information on the system is sent to a remote logging facility, it may be here for an extended period of time.
So we want to be sure we’re looking at that monitoring data and then as we move further down the list, we find information that rarely changes, such as, the physical configuration of the device or the typology of the network.
And lastly, information that could be around for years is information in your backups and in your archival media. There’s a great deal of information stored on a system’s hard drive or SSD. So it’s useful to know the best way to gather that information for forensics. The first thing we should do is prepare the drive to be imaged.
We would normally power down a system so that nothing could be written to that drive, and often we’re removing the storage drive from the system so that we can then connect it to a device specifically designed for imaging. These are usually handheld systems that are designed with a right protection so that nothing on that drive can be altered.
We would then copy everything from that drive. And by copy everything, I mean a bit-for-bit representation of everything contained on that particular storage device. This is going to preserve all of the data, even information that normally would be in a deleted file or be marked for deletion. This way we’re able to collect the entire drive in its exact form and later on we can provide analysis of exactly what we found in that image.
Another important source of data would be the information in memory. This can be difficult to gather. Not only because this information changes constantly, but the process of capturing the information from memory, can change a portion of that memory. There are third-party tools available that can provide memory dump. They will take everything that’s in the active memory of the system and copy it to a separate system or a separate connected device.
We want to gather as much as we can from the memory because some of this information is never written to a storage drive. Things like your browsing history, clipboard information, encryption keys, or your command history may be found in memory but may not show up on the storage drive itself.
Our modern operating systems have a temporary storage area called a swap or a pagefile. And depending on the operating system you use, these have slightly different uses between different operating systems.
In many cases, the swap drive is an area of your storage device that you could use to swap information out of your random access memory and free up information for other applications to execute. We may need more room for an application to execute, so we’ll swap some information out of memory and store it temporarily on our local drive, perform the execution inside of memory, and then pull everything off of that drive and put it back in the active RAM.
The swap might also contain portions of an application. We can take an application that we’re not currently using, we can transfer it out of our active memory, store it temporarily on our local drive so that other applications are able to execute. You can think of this as an extension to a memory dump. So as we’re taking the memory dump and gathering information from active RAM, we want to be sure to also gather information from the swap.
There’s a great deal of information we can get from the operating system itself and there may be files and data that can help us understand more about the security event that we’re investigating. We might want to start by looking at the core operating system files and libraries and compare those to what a known-good operating system file and library would look like. This is something that you can usually capture with a drive image so that you can later perform that analysis.
But there’s other information that’s in the operating system as it’s running. For example, we can look at the number of logged-in users and who those happen to be, we can see what ports might be open on that device, we could see what processes are running in that operating system, and understand what devices are currently attached to that system. If we’re investigating a malware infection or a ransomware installation, those details from the operating system can provide important information during our analysis.
Collecting the same type of information from a mobile device can be a bit more of a challenge, but there are tools available to help gather details from smartphones and from tablets. There are capture methods available.
You could either use a backup file that was previously made from that device, or you can connect directly to the device usually over USB, and create a new image from that device. Inside of these smartphones and tablets, you may be able to find information about phone calls, contact information, text messages, email data, images, movies, and so much more.
With some security events, you may find that the firmware of a device has been modified. This is not unusual if we start to look at some of the cable modems and wireless routers that are in use, and how some attackers have completely replaced the firmware to gain access to those systems.
Since we are talking about firmware, this would obviously be associated with a particular product and a particular model of a product, and the attacker is usually the one gaining access to the device to then install this updated or hacked version of the firmware.
Getting access to the firmware may help us understand how this device was exploited. We might also be able to determine once this firmware is installed, what functionality did the attacker have. And lastly, if this device is still running, we may be able to see real-time data being sent to and from this device.
If you’re working with a virtual machine or VM, we may be able to gather details from a snapshot. We can think of snapshots as a way to image a virtual machine. This commonly starts with the very first original snapshot, and you can think of this as the full backup of the system.
It’s common to then take subsequent snapshots of this VM. Especially, if we’re going to make changes to the virtual machine, and each snapshot is an incremental update from the last snapshot that was taken. If we then wanted to recreate this virtual machine, we would need the original snapshot and all of the incremental snapshots that were taken since that point.
Once we have this snapshot, we have a complete image of the system. We have everything in the file system of that virtual machine. We can see the operating systems, the applications, the user data, and everything else in that OS.
Our operating systems and applications are able to speed themselves up through the use of a cache. A cache is a temporary storage area and it’s designed to speed up the performance of an application or an operating system. There’s many different kinds of caches. You’d find caches available on your CPU, there’s disk caches, caches available for a browser, or caches that are connected to the network.
These caches often contain very specialized data. If we’re talking about a CPU cache, then everything in that CPU cache will be focused on the operation of a single CPU. If we’re looking at an internet browser cache, then we’re looking at a broader amount of data that is used specifically by an internet browser.
The cache is usually writing information that was queried originally so that if a second query was made that was identical, we could simply go to the cache instead of performing the query against the original service. This speeds up the process since we don’t have to go all the way to the original service to find one of the answers that we had already previously asked.
This is something that usually is also temporary. So once we write information to a cache, that information usually times out, or it’s erased when the cache fills up. We might also find that some caches may stick around for a very long period of time.
A good example is the browser caches in your system, where information may be there for days or weeks. If you were to look into a browser cache, you would not only see the URLs of the locations you were visiting, but you would also see the information that made up that page, including the text and the images.
Your network also contains a wealth of information. You can see all of the different connections being made over the network. And in some cases, you may be able to capture the raw data that was sent over the network. It might also be useful to see what sessions were created from this device, and what sessions were inbound to the device. You could also break this out by sessions created by the operating system, and sessions created by the applications.
And in larger environments, you may find that there is extensive packet captures occurring and storage of large amounts of data that’s being sent across the network. That would allow you to effectively rewind back in time and see the raw data that was transferred through the network. There might also be smaller packet captures available on security devices such as firewalls, and intrusion prevention systems.
And once we’ve looked through all of those locations, we may still find other bits of data that are stored in different places in memory or on your storage drive. We refer to these as artifacts. And these artifacts may be something that is stored in a log.
It may be flash memory. It could be the cache files that are used by the prefetch process of Windows. It might be information that’s stored in the recycle bin. And the information you’re storing in your browser bookmarks or your logins records might also be considered an artifact.