Troubleshooting problems will probably be one of the most common things you do every day. In this video, you’ll learn how to break down the troubleshooting process into a number of easy-to-follow steps.
In IT, a big part of our job is solving problems. And in this video, we’re going to step through this troubleshooting process that can not only solve problem you’ll have in IT, but they can be applied to problems that you’d solve in any environment.
This first step is all about identifying information. You need to completely understand what the problem might be. And if possible, you want to be able to duplicate the problem.
With technology, these problems often come with multiple symptoms. It’s never just one thing. It’s often an error message, the system is slow, and the screen turns blue. And you need to be sure to document all of these so that you can understand what the total issue might be.
If there’s an end user that’s dealing with this problem, then they’re probably very familiar with what happens before, during, and after this issue. So make sure you ask them a lot of questions to get a lot of details from their perspective. If things have been working fine for a long period of time and then suddenly they’re not working properly any longer, then something might have changed with their environment. It might be worthwhile to do some investigation and see if there might have been some recent changes. If multiple issues might be happening simultaneously, it might be worthwhile to break those up into individual problems and then go through this same troubleshooting process with each one of those individually.
Now that we’ve collected information, we need to start thinking about what might have caused this problem. We can start with the most obvious issues. The simplest explanation is often the most likely.
That doesn’t mean it’s the only possible thing that might be happening. You need to consider every possible scenario, even the ones that might not be completely obvious. It’s useful to make a list of what all of the possible problems might be. And then you can start with the easy theories to see if those might be able to resolve the problem quickly.
Now we can go to our list. We’ll start at the top, and we’ll start testing the theories that we have that might be causing this particular issue. We can then determine what the next steps might be to resolve the problem, and see if that actually solves the issue. If it doesn’t work, then we need to go to the next on our list and try resolving it with that particular theory. And if we get through this entire list and we still don’t know what might be happening, it might be worthwhile to call an expert, have them come in and give you additional ideas of what might be causing this particular issue.
Once you successfully identify which one of your theories actually solves the problem, then it’s time to implement the solution. Sometimes this can be done very quickly. You make the change and everything’s back up and running. But if this is on a production system, you may only be able to make changes on a certain day at a certain time. In that particular case, you need to come up with a plan of how to implement this fix in that production environment.
Even the best-laid plans, of course, don’t always work as expected, so it’s not only good to have a primary plan of how to resolve this, but also a Plan B or perhaps a Plan C. And ultimately, you may need a plan to revert back to the previous configuration if nothing goes right.
When your change control window arrives, you can execute on your plan. You can implement the fix. And hopefully, everything happens in the way that you planned it. You may have to escalate this, especially if it’s a critical problem, and go outside of your change control window. It may require help from a third party, especially if your time frames are very short.
Once you’ve applied the fix, you still have to test and make sure that your fix actually solved the problem. You can test this yourself. And you may even want to consider bringing in your customer, because they’ll be the ultimate arbiter on whether this problem was really resolved.
Now that the problem has been implemented, you’ve tested, and everything’s back to normal, it might be worthwhile to create some preventative measures so that this problem doesn’t occur in the future. Now that you’re through this much of the troubleshooting process, you’ve accumulated some very valuable information. You’ve resolved an issue and you know exactly the process it takes to resolve it in the future. You should make sure all of this is documented in your knowledge base, so that if anyone else ever runs across this particular issue, they’ll know exactly how to address it. You might want to consider creating a formal database for this information, something that’s searchable, that you can always add information to and edit as needed.
Now you’ve made it all the way through the troubleshooting process. You’ve identified the problem that’s occurring. You’ve created your own list of theories of why this might be happening. You’ve tested those theories and identified which one of your theories actually solves the issue.
You create a plan of action on how you’re going to resolve this issue. You implement that plan. And then you test it and make sure that the system is performing exactly as expected. Once all of this is completed, you document the entire process you went through. And now you can solve any problem that you’re faced with in an IT environment.