When something goes wrong with technology, it’s your responsibility to fix it. In this video, you’ll learn about disaster recovery best practices for backups, power restoration, cloud storage, and more.

<< Previous Video: Change Management Next: Safety Procedures >>

For any organization, there is the potential for disaster to occur. So our planning process needs to take into account a number of strategies that can help us if we ever need to perform a disaster recovery. Of course, you should always have a backup. But there are many different kinds of backups that you can create.

For example, you may want to create an image level backup. This is a backup that incorporates everything that’s in a particular server or device and creates a single image from all of that data. This is sometimes referred to as a bare metal backup because if you wanted to restore from this backup, you could take a separate server with no operating system, that’s referred to as the bare metal, and apply this image.

Applying that image would restore every file on that device, giving you a fully functional service. Many operating systems have a volume snapshot function built into the OS. And if you’re running this as a virtual machine, you can easily create an image for that entire VM. This not only allows you to recover the entire system with the single image, but you could take that image to another computer to create an exact duplicate of this server.

Another type of backup strategy is the file level backup where you’re simply copying the important files that are stored on that device. You may not be copying the entire operating system, but you’ll have all of the files necessary to get your applications running. This means if you need to recover this particular application, you’ll need to find a device that at least has an operating system running, and from there, you can restore your applications and files on that existing OS.

Restoring an application may be relatively simple or it may involve multiple devices and have a level of complexity associated with it. Some applications will distribute the software across multiple servers, and all of those servers have to work together for that application to operate properly. Not only is the application important, but the data that is used for the application is just as important.

Sometimes this is stored in a separate database, so you would need a database server with all of that data loaded. Or sometimes the data is stored in different places on the application servers. Sometimes this information is not on your local servers, but it’s, instead, stored on a cloud service, perhaps provided by the application developer. You might have a combination of data stored locally and data stored in the cloud, so it’s important to know where this data is before you begin the backup and recovery process.

For complex applications, it’s useful to document this process prior to having a problem. That way you know exactly what needs to be recovered during the disaster recovery process. I often mention in my videos that you must have a backup of your data. But having the backup is only part of the equation. You have to be able to restore the backup.

And if you’re trying to recover from a disaster, you don’t want that to be the first time that you’ve ever tried to restore this data. Many organizations will perform some type of disaster recovery testing during the year. This gives you a chance to take the data that you’ve backed up, perform a restoration, and see if everything works as expected.

Once the restoration is complete, you can use the application, make sure the data is working properly, and have the end users confirm that everything is working as expected. During the year when you’re not performing these tests, you may want to perform occasional audits to make sure that the backups are working properly and the data is stored as expected.

When a disaster occurs, you may have an intermittent power loss. To be able to mitigate this power loss, you may want to have a UPS, which is an Uninterruptible Power Supply. This will give you short-term power so that you’re able to keep everything running, even though there’s no main power source available. A UPS will protect you when you lose all power, such as a blackout, when there’s a drop in voltage, which is a brownout, or when there is a spike of voltage, which would be a power surge.

There is generally three categories of UPS that you could choose from. An offline, or standby, UPS is constantly watching the voltage from the main power. If that main voltage disappears, it goes from offline and standby mode into an online mode and switches all of the power to come from the batteries that are in the UPS. When the main voltage comes back, it switches back to the main voltage line.

A line-interactive UPS can slowly ramp up how much power it’s providing. So if you have a brownout, or a little bit of voltage is lost on the line, the UPS can provide additional voltage from the battery. And in many data centers, you run an online, or double-conversion, UPS where you are always running from a battery configuration, and there is no switching over if you lose the main line configuration.

The main line is constantly refreshing the batteries. And the batteries running in the system are providing the main voltage. If you ever do lose the main voltage, there is no switching over or delay while you’re waiting for the UPS to kick in from an offline mode.

There are also different features available on different UPS models. Some have an auto shutdown feature, which will tell the computer it’s connected to to begin the shutdown process because the main power has failed. There’s also differences in the amount of battery capacity and outlets. And some UPS also have interfaces for phone lines or other network connections.

Another type of power protection is a surge suppressor. The power that’s coming in from your main voltage line may have spikes and noise that are created by changes in the electrical system or by storms that may be coming through. The surge suppressor will notice when these spikes occur and will take that additional voltage and send it to the electrical ground.

Some surge suppressors have filters that can remove some of the noise that would appear on an electrical line. If you look at the specifications for a surge suppressor, you’ll notice this filter is listed as a number of decibel levels. And the higher Db means that you have a better filter in that surge suppressor.

Many people will store their backups and other important data in a cloud-based storage solution. This may be the cloud solutions from Amazon, Microsoft, and many others. These cloud solutions allow you to keep all of this data off-site on these third-party cloud servers without having to deal with any local backups or data inside of your building. This means that you don’t have to have any local backup servers or tape drives, and you don’t have to worry about taking the data off-site since the data has already been backed up to the cloud, which exists on servers that are already off-site.

When you’re storing your data on a third-party server, though, you’re losing a bit of control over the access to that information. So you want to be sure that you’re able to put in the proper security measures so that nobody else can see that information. It’s a good idea to encrypt your data before storing it in the cloud so that nobody can access that information if they happen to gain access to your cloud credentials.

Once you bring up all of these systems in your disaster recovery process, there still needs to be some way for people to log in. So you have to make sure that your centralized authentication functions are still in place. For a Windows domain, you will have a backup that has all of your usernames, passwords, and everything else associated with your user’s credentials.

For disaster recovery, you also have to consider if you’re using any type of multi-factor authentication and either enable or disable those features as well. There might also be additional authentication databases that you’re using. For instance, people who are logging into a switch or a router may be connecting to a third-party database using TACACS or RADIUS.

If you’re backing up your centralized authentication method, then you know that you’re able to restore those authentication methods if you run into a problem. This is another good reason to avoid having local accounts on these devices so that you’re able to manage all of your credentials centrally and be able to backup and restore them if there’s a problem.