Many different variables and business functions are used to create a business impact analysis. In this video, you’ll learn about recovery times, mission-essential functions, single points of failure, and more.
When your hardware or software systems have a failure, there’s going to be a lot of questions about the recovery of those systems. One measurement of recovery is the MTTR, or the mean time to restore. This would be the average time to restore a system once it fails. You might also hear this referred to as mean time to repair.
That’s a little bit different than a mean time to failure, or MTTF. With a mean time to failure, the system is not reparable. It’s a system that must be replaced if there’s a failure. So the MTTF might give you a reasonable idea of the lifetime expected for a particular product, or a particular part that’s in a system.
A calculation that’s a little more difficult to calculate, but still an important consideration for a business impact analysis, is the mean time between failures, or MTBF. How much time can we expect between one failure and another? It gives us an idea of how to plan for any failures, and what we might be able to do to prevent any failures.
When a failure occurs, it’s going to be important to get things running again. So one of the questions you’ll be tasked with is, at what point do you consider yourself up and running? There is usually a particular service level required for this application or service that you’re providing, so the recovery time objectives should document at what point the service becomes available.
When you’re recovering a system, it may not be the case that a service is either not available or available. There may actually be levels of recovery between the 0% and 100% availability. These would be recovery point objectives, where you can determine at what point in the recovery process have you passed one of those objective marks.
This might allow you to, for instance, bring a system back online with a minimum amount of data, so that you could search for the last days of data, but you would not have access to a year’s worth of archives. Or, you would be able to bring the system online and provide limited availability to a certain group of people, but perhaps not to everyone who might need access to that particular service.
When you’re dealing with a business impact analysis, you’re usually calculating a quantitative value that you can associate with uptime. This is usually expressed as a percentage. So in a particular year, you might be striving to have 99.999% availability. You often hear this referred to as five nines.
There will also be questions about the definition of availability. Does this mean that the system should always be available 100% of the time? Are there weekly maintenance tasks associated with this availability, especially if it is something that is scheduled? Or is this something only associated with unscheduled downtime? These definitions will be important to determine when you’re providing these calculations of availability, and it’s especially important if these availability calculations are part of your salary or part of an ongoing bonus.
If we look for an entire year, we can see what the differences are in these availability calculations. For example, a 99% downtime means that during the year, you could be down for a total of 87 hours and 36 minutes. If it is four nines, that means you only have 52 minutes and 34 seconds of downtime available. And if you’re working on six nines of availability, which is 99.9999% availability, that means that your annual downtime cannot exceed 32 seconds.
When you start thinking about downtime and providing a business impact analysis, one of the first things you need to determine is what particular functions are mission-critical. For example, if a hurricane came through your organization and you had to rebuild from scratch, where would you start to be able to bring the systems back up again?
This is usually one of the very first steps for performing a business impact analysis, and it helps you determine some broad business requirements for the organization. So you might include payroll. There may be an accounting function. You may have a manufacturing facility. And any other functions that are required to keep the business running. From there, you can integrate the IT services into that piece and identify what critical systems are important to keep the business running.
There are many different things that can go wrong with the technology that we’re using. It might be a software bug or a hardware failure that brings down an entire system. So it’s important to remove any single points of failure that might cause outages or downtime to a particular service.
For our network infrastructure, we often avoid single points of failure by installing multiple devices. So instead of having a single router to the internet, there might be two routers connected to the internet. Inside there, there may be two firewalls, and inside that might be two switches. Some people refer to this as the Noah’s Ark of networking, since we’re simply adding pairs of systems to maintain the uptime and availability.
In our physical facilities, we perform a similar function for things like power, where we might have a main power system and then a backup power source. Or we might have a primary cooling system, and then a backup cooling system if that one happens to fail.
There might also be a need to create redundant people. We might have people in one location that are able to perform the same job as other people in a different location. And if we had to shut down an office, we could move all of those services to a different group of people that would be performing the same tasks.
Even after going through all of these systems and creating these redundancies, there are still going to be places where there is a single point of failure. There’s no practical way to remove all single points of failure. You just have to make sure that you’re investing in the right systems in the right place in the organization to keep everything running as much as possible.
Visually, here is how this redundancy might look. You might have multiple links to the internet. In fact, you might be using different internet providers for both of these. You might be connecting to those through two different routers, one that is usually set as active and one might be set as standby. Behind those routers, you may have multiple firewalls, one that is active and one that is set for standby. And then you might have internal core switches that are redundant. And if one switch happens to fail, all of your redundant servers will fall over to the backup systems. So you’re always up and running.
There are many different considerations when building a business impact analysis. One of the most important ones is, obviously, life. We want to make sure that everyone working in the organization is safe, and that should be the primary goal for any business impact analysis. Then we have to look at our property– the buildings and the places where people are working– to understand the risk to those assets.
And then we also have to look at safety. If we happen to have a hurricane blow through, we have to consider that there may be power lines, and it may be difficult for people to work inside those buildings. So we have to consider how will our business impact analysis deal with these safety concerns.
There will always be some type of financial impact when there is an incident, and we need to think about what those costs might be. And finally, we need to consider the reputation of the organization. If our building is blown away by a hurricane, and we have no backup systems or any other way to get back up and running, the overall reputation of our business may suffer.
There is also a set of analysis that has to occur around privacy. You’ve probably received a privacy statement in writing in the mail. This is a requirement for some compliance, such as the Gramm-Leach-Bliley Act, which provides a financial information disclosure, and HIPAA health care, which also makes sure that you’re aware of what your privacy requirements are.
Before an organization is sending you these privacy statements, they have to determine what type of privacy compliance is associated with which particular systems. So they’ll perform a privacy threshold analysis, or a PTA. This is the first step to help determine what type of privacy requirements may be associated with a particular kind of data. So you’ll examine the business processes that are in use, you’ll identify which of those business processes have some type of privacy-sensitive component, and then you’ll determine if you need to perform additional assessment of that data.
If the privacy threshold analysis does determine that there is a privacy component, then you’ll need to perform a privacy impact assessment, or a PIA. This makes sure that the systems and the processes that you have in place are compliant with the existing laws and regulations. You’ll need to determine what type of personally identifiable information is being gathered, and how that information is being used. And all of that information will be provided in the written privacy statement that’s made available to the users.