There are many different ways to backup data. In this video, you’ll learn about incremental backups, differential backups, compare NAS to SAN storage, and more.
If you’re using an operating system based in Microsoft Windows, there is a bit associated with every file system object called the archive bit or the archive attribute. This archive bit is turned on whenever a file is modified. If you wanted to go into the Properties of a file and look at the Attributes– there’s a button here showing the Advanced Attributes. There is a check mark here that shows the “File is ready for archiving.” That is the archive bit. And by examining that, you can see what files may have changed since the last time a backup has occurred.
It’s very common, when performing a backup on a system, that you back up every single file on the system. This is called a full backup, and it will take everything that is stored in that operating system and save it in that full backup. Once the backup is complete, the archive bit is cleared, signifying that no changes have been made to that file since the last backup.
An incremental backup will occur after the full backup has occurred, and it will back up all of the files that have changed since the last incremental backup. This is a bit different than a differential backup, which also occurs after a full backup, but the only files that are backed up are the ones that have changed since the last full backup. Let’s look at these visually.
And we’ll start with an incremental backup. Up as I mentioned, the first thing you want to do is perform a full backup of the system. The next time you perform a backup, you’re going to back up the data that’s changed since the last full backup and since the last incremental backup. This is especially important to remember during the restoration process, which requires not only the full backup but every incremental backup that has been made since that full backup occurred.
Let’s look at this visually, and we’ll start with a Monday, where we perform a full backup and back up everything on our system. This is an incremental backup process, so on Tuesday, we’re only going to back up all the files that have changed since that last full backup. On Wednesday, we’ll also perform an incremental backup, and we’re only going to back up the files that have changed since the last incremental backup.
And in this particular case, the amount of data that has changed between Tuesday and Wednesday is less than the data that changed between Monday and Tuesday. You might also perform an incremental backup on Thursday, and this would be everything that has changed between Wednesday and Thursday. And now on Friday, we want to restore this whole system. And for restoration, we’ll not only need the full backup, but we’ll need every single incremental backup that has occurred since that full backup was taken.
A differential backup starts a similar way, where we have a full backup that backs up everything on the system. Each subsequent differential backup, though, is going to back up everything that’s changed since the last full backup. So every day, the backup is going to get bigger and bigger and bigger as we change more and more information since the last full backup. If you need to restore, the only two things you’ll need is the full backup that originally occurred and the last differential backup.
Let’s see the differential backup visually. We’ll take that full backup on Monday. On Tuesday, we’ll back up only the things that have changed since the last full backup. On Wednesday, we will also back up everything that has changed since the last full backup. So you can think of the Wednesday backup as including everything from Tuesday plus everything that has changed since then. Thursday will be everything that’s changed since the last full backup. And then finally, when we’re ready to recover this data, we will need only the last full backup and the last differential backup that’s been made.
Let’s summarize all three of these backup types. On a full backup, we are backing up all data on the system. The process of backing this up is going to take quite a bit of time because we’re backing up everything that’s on the system. But restoring this data only requires the single set of backup tapes, so the restoration time is relatively low and only requires that single tape set. When you perform a full backup, all of the archive bits are cleared on all of those backed up files.
An incremental backup is going to back up new files and all files that have been modified since the last incremental backup. The backup time is relatively low because we’re only backing up files that have changed, but the restoration time is relatively high because we need to restore from not only the last full backup but every other incremental backup that’s occurred as well. And of course, after we perform an incremental backup, all of the archive attributes are cleared.
A differential backup is going to back up all data modified since the last full backup. This means it will take a moderate amount of time to perform this differential backup each day, and every day the differential backup gets longer and longer and longer to take because we’re adding on more data every differential backup. Restoring a differential backup is also a moderate amount of time, because we’re not going to need any more than two sets of backup information. That would be your full backup and the last differential backup. When you perform a differential backup, you are not clearing the archive attribute because you’re going to perform this backup again on the next differential backup.
In an enterprise, we’ve traditionally thought of backups as something you perform to a magnetic tape. This is a sequential storage device, because you have to forward through this tape to be able to find the data that you’re looking for. And the tape sizes range anywhere from 100 gigabytes in size to multiple terabytes in size for each one of those cartridges. One advantage to tape is that it’s relatively easy to store and very easy to ship around, so this makes a perfect archive medium, especially if you’re keeping something offsite.
Many enterprises have moved from tape backups to disk backups because the price of hard drives has decreased so much through the years. Hard drives are also a faster medium to use if you’re writing or reading from that drive. And it’s also a method that can be used with deduplication and compression of data, making for a more efficient set of backups.
And some backup strategies will use a copy or an image of a system that is an exact duplicate of a system at a particular point in time. This may not allow you to keep different versions of a particular image in a single backup medium, but it is something that you’re able to keep offsite and then be able to use later on if you need to restore that system.
When you’re storing files to a drive over the network, there are two popular mechanisms that you can use. One of these is a Network Attached Storage, or a NAS. A NAS provides access to a large storage array that’s connected over the network. We also refer to a NAS as file-level access. This means if you need to change any portion of a file on that NAS, you have to rewrite the entire file on that device. This may not be a problem if these are very small files, but if you need to change or modify part of a very large file, it will require overwriting the entire file on the NAS.
A SAN might be considered a more efficient way to store data. This is a Storage Area Network, and it’s often configured to look and feel like a separate storage drive on your system. An important characteristic of a SAN is that it provides block-level access. This means if you need to change a single portion of a very large file, you only need to change that portion on the disk instead of having to rewrite the entire file to the SAN. For both a NAS and a SAN, you obviously need plenty of bandwidth to be able to transfer files across the network and store them in that drive array. This can sometimes require a completely different network that is designed to only provide file access for the NAS or the SAN.
These days we’re performing more backups to the cloud. A cloud-based service can provide us with an automatic offsite backup function, where we would be taking files on our local device and backing them up to a device that’s located somewhere else in the cloud. This can often support many, many devices in our environment, but it requires that we have enough bandwidth to be able to transfer these files back and forth to that cloud-based service.
Another type of backup would be an image backup. Instead of backing up individual files on a system, we back up everything that is on a computer and create an exact duplicate or replica of that entire file system. This means we’re backing up the operating system, the user files, and anything else that might be stored on that computer. If we need to restore this data, we restore an exact duplicate of that particular system all simultaneously. This ensures that we’ll be able to restore everything to exactly the way it was when we originally took that image backup.
When we perform these backups, it can be to an offline backup system or an online backup system. With an offline backup system, you’re backing up your local devices to this backup component. It’s usually something that performs very quickly, and it’s over a secure channel. We have to make sure that the communication between the system that’s being backed up and the backup service itself is protected and constantly maintained, and it often requires that this information be stored at an offsite facility for disaster recovery purposes.
An online backup is one that is constantly accessible and constantly updated throughout the day. This is one that occurs over the network, usually to a third-party or cloud-based service, and it’s usually over an encrypted channel. Since this backup is always online, it’s often accessible from any of your devices. So you may be able to back up to this online backup system from your workstation but be able to access that backup from your laptop or your mobile devices. Because this online backup system is constantly providing a backup and storing this information in the cloud, it requires enough bandwidth to be able to transfer these files and store them in a reasonable amount of time.