Redundancy, Fault Tolerance, and High Availability – CompTIA Security+ SY0-501 – 3.8

Now that you’ve built your cloud-based application instances, how to keep them running with 100% availability? In this video, you’ll learn about redundant systems, building fault tolerant architectures, and the advantages of high availability.

<< Previous Video: Resiliency and Automation Next: Physical Security Controls >>

If someone’s trying to gain access to your servers or your data they’re going to look anywhere on the network to be able to find it. You have web servers, database servers, there’s middleware, there’s security systems, and all of these devices are required if you’re going to build out an application instance. The bad guys know that this is where the important data is and they’re going to keep looking inside of your network to find all of these pieces.

With a distributive allocation, you’ve created a technological scavenger hunt. Instead of keeping everything in one central place on the network you’ve scattered these systems into different areas. So all of your applications, your data, and other critical assets are all separated from each other. This not only makes it more difficult to find and target these systems, it allows you as the security professional to be able to set up additional separation between data, application, and all of the other components.

One primary goal of a security system is to maintain availability. And one of the ways that you can do that is to build in redundancy and fault tolerance to your application instances. The goal is obviously to maintain the uptime and availability of these services so that your organization continues to function.

That means that all of your hardware needs to continue to run even if a failure happens to occur in any piece of that hardware. The services and the servers should always be available. You also want to be sure that the software itself continues to run. If there’s any problems or issues relating to the software, you want to have other contingencies in place to make sure that the software is always available.

You also need to think about the networks that are connecting all of these systems together and provide fault tolerance. So that if one component fails, the rest of the network works around the problem, and the application continues to function. One way to provide the redundancy and fault tolerance of these systems is to provide redundant hardware components.

So instead of running everything on one server you would have multiple servers available. And if anything happens to the first server your systems would automatically use the secondary device. This might also include load balancers, and other devices with redundant power supplies to maintain the uptime and availability.

We often create redundancy with our storage systems by using RAID. RAID is the Redundant Array of Independent Disks. It allows us to have multiple physical drives inside of a server that are redundant to each other. If one particular drive fails the rest of the drives continue to allow the application to run normally.

Another popular form of fault tolerance is a power fault tolerance using a UPS. UPS Is an Uninterruptible Power Supply. If you lose the main power to your data center, the batteries in your UPS will continue to keep everything up and running until that main electrical line is restored.

It’s also common to cluster a group of servers together so that they appear to be one single server. This means that if any of those systems happens to fail the rest of the devices in the cluster will continue to take the load and maintain the operation of that application.

A load balancer will share a load across multiple servers. And if any of those servers happens to fail, the load balancer recognizes the failure and begins distributing or balancing that load across the remaining available servers.

There are different levels of RAID that you could implement with the storage devices in your servers. The type of RAID that provides zero redundancy is RAID 0. This is striping data across multiple physical drives. It’s not providing any type of parity, which means there’s very high performance but no fault tolerance when you’re using RAID 0.

With RAID 1, we do have some fault tolerance because we are mirroring the data. We’re copying data that is stored on one physical device to another physical device. So there’s really two copies of that data. We’re duplicating the data for fault tolerance. But as you might imagine, this means that we’re using twice as much disk space to be able to do that.

With RAID 5, where striping the data and adding some parity, which means that we have a level of redundancy with our data and it uses only an incremental drive to be able to store that parity data. Some organizations will combine these RAID types together. For example, you could have RAID 1 plus 0 or RAID 5 plus 1. By combining these RAID types you can customize the exact type of redundancy that you need for the type of data that you are storing.

Just because you’ve implemented redundancy doesn’t necessarily mean that your systems are highly available. Highly available implies that the systems will always be available for anyone who needs them at any time. But with redundancy, you may have an additional server but you may need to manually power on that server to provide that redundancy.

You’ll often see high availability shortened as HA, which means that this is always going to be on and always going to be available. This might involve a number of different components all working together. You might have a pair of firewalls that are connected to a pair of routers that are then connected to a pair of switches. So that if any single one of those components fails, the other– components will work around the failure and provide that high availability.