It’s a challenge to keep a system running 100% of the time. In this video, you’ll learn about high availability, fault tolerance, and the VRRP and HSRP protocols.

<< Previous: IGP and EGP Next: Unified Communication Technologies >>

We describe high availability as being a system that’s always going to be there and available for us. It’s common to see service providers describe this as a percentage of uptime. You’ll commonly see 99.999% uptime guaranteed. And they’ll call this five nines of uptime. If you perform the calculations of that, for 365 days at 24 hours a day at 60 minutes for every hour, you’ve got 525,600 minutes in a year. So 99.999% uptime means that you can only be down for about five minutes of time for the entire year to really say that you have five nines of uptime.

To build a system that is highly available almost always means that there are going to be additional costs associated with this. There’s always going to be one extra thing you can add to be able to make the device or the system more available to everyone. So you could add UPS systems. You could add additional power from another provider. You could add a higher quality server component. You could add multiple servers into the mix. The more devices and the more things that you’re adding, the higher the costs are going to be.

Of course, nothing is going to be 100% available by itself, and usually not 100% available even if you plan for these contingencies. Very often, there is a third party involved. There may be a power provider. There may be an act of nature that you can’t build a system around. And therefore, you always have to look out for those influences that would cause your highly available system to end up failing anyway.

You might also hear the term fault tolerance. This means that if something does fail, your system still remains available. We can still keep those five nines of uptime even if a server happens to go down. It might degrade performance. There may be a slow down, but at least the service would still be available. This usually involves adding additional devices, perhaps additional protocols on to the network, so it adds a level of complexity over the existing system that you have. But if you’re trying to maintain uptime, regardless of what might fail, then you might want to add some fault tolerance into your network.

This might be a single device fault tolerance. You might have RAID set up for your drive arrays. You might have redundant power supplies or redundant internet connections. You might have multiple devices for fault tolerance. You might have an entire server farm, so that if one device does fail you have many other devices there to help pick up the load.

Here’s an example of a relatively common network diagram. We’ve got an internet provider. On our side we have a firewall. I’ve got a router inside of that firewall that connects to a switch, and finally to our web server. Now, if you’re looking at this diagram, you might think every step along the way is an opportunity for down time. So to maintain availability, we need to build some fault tolerance into this network.

A good example is maybe we duplicate the firewalls so that if we do lose one firewall, at least we’d still have security and maintain the connectivity back to our web server. But if we’re looking at multiple firewalls, we’re going to need to add redundancy also to our routers and to our switches. In fact, we may even want to add redundancy to our web servers as well by adding a load balancer into the mix. This is what we mean by adding additional cost and complexity when we want to maintain availability and add this fault tolerance, is where you had to buy a lot of different pieces of equipment just to be sure that if there’s a problem anywhere on this network, that we’re still able to provide this service.

If you’re working with routers and you want to have a redundant routing system set up, you may want to use one of these high availability protocols inside of your router. One is VRRP. This stands for Virtual Router Redundancy Protocol. That means that we’re going to have a virtual IP address that’s used inside of our routers. So you’ll set up a default route or a default router on your network using one of these internal IP addresses. If a router disappears, another one is there to take its place using the exact same IP addressing. So you don’t have to change anything on your client devices. They’re able to simply communicate as they normally do, but you’re able to manipulate what they’re able to see based on this Virtual Router Redundancy Protocol.

A Cisco proprietary version of VRRP is the Hot Standby Router Protocol. It’s a similar scenario where you have another router that’s sitting there, and if you lose the first router, that Hot Standby is able to take over, and the default gateway is now assigned to that virtual router.

Regardless of what routing protocol you decide to use, you’ve now got a redundant network. And if you lose any one of these devices, you’ll still maintain the availability of your services.