A network outage requires a recovery process to get everything running again. In this video, you’ll learn about recovery time objectives, recovery point objectives, mean time to repair, and mean time between failures.
We use a number of different metrics when we talk about recovering a system. One is the RTO, or Recovery Time Objective. This is the amount of time it takes to get your systems up and running to a particular service level. It may be that the primary objective to be up and running is to have the web server operational, and the RTO would specify how long it would take to get that web server running.
This may work in conjunction with the RPO, or the Recovery Point Objective. The RPO designates how much data needs to be available before we can say that we’re back up and running. It may be that having the last hour of data is enough for us to say that our system is now available. Or it may be that the entire database has to be available before we can say that we’ve recovered this system.
When we talk about repairing a system, one of the things that everyone wants to know is, how long is this going to take? And that value that we’re going to provide is an MTTR, or a Mean Time To Repair. We might say that this particular problem, with the type of data that we need to recover, takes about an average of an hour to be able to get back up and running. Or it may be that it takes 24 hours, on average, for a mean time to repair.
If you’re purchasing equipment, you may want to ask what the MTBF is for this particular component. The MTBF is the Mean Time Between Failures. And it’s a prediction of how long that system should remain up and running before a failure was to occur.
If you’re using a system that is all solid state, there’s no moving parts inside of that system, it might have a very long mean time between failures. But if you’re using a system that has a lot of fans and other moving devices, you may find that the MTBF is a much smaller number.
As a network administrator, you’ll need to maintain routers, switches, firewalls, and many other infrastructure devices. And all of those devices have firmware and configurations that need to be backed up. These devices may have IP address settings, security settings, port configurations, and anything else that makes up the config of that particular device.
Most devices will allow you to access this configuration across the network. And you can, usually, have an automated process to go out every night, grab the configuration, download it to a separate machine that you could then use if you have a failure. You may find that these configurations are specific, not only to the device, but the version of firmware running on that device. So it may be important, not only to have a backup of the configuration, but to also have a backup of the current version of firmware.
These backups can be very useful if you bring in a piece of equipment that has a different version of firmware and would need a different configuration. Instead, you can use your backups to revert to a previous version of the firmware and use your old configurations to bring the device back up and running.