Troubleshooting Storage Devices – CompTIA A+ 220-1101 – 5.3

We rely on our storage devices for the safety and availability of our operating systems and data. In this video, you’ll learn how to troubleshoot storage devices with boot failures, data corruption, RAID recovery options, and more.

Something you do not want to see when you start your computer is a message on the screen that says, cannot read from the source disk. This indicates a problem reading or writing information to the storage drive. And it could indicate that the storage drive has failed. It could be that the storage drive is still working but you’re getting very slow response, especially if you see constant LED access activity on the drive light. Or it may be retrying over, and over, and over again to try to read information from that drive.

And if you’re using a hard drive, it could be that it’s making a loud clicking noise over and over again. We sometimes refer to this as the click of death. Because there really should not be any loud noises coming from inside of that drive case. In those cases, it might be that the drive is failing or it may have already failed.

To troubleshoot any of these types of storage problems, the first thing you should do if possible is to make a backup of all of the critical data on that drive. Hopefully, you take backups constantly on this drive so you don’t have to scramble around to get a lot of the data off of that system. If this drive uses cables to connect it to the motherboard, you may want to check for any that may have come loose or that might be damaged.

You might also want to check to see if your system may be overheating. This is something that obviously would affect all of the components inside of your system. But some of the errors you may be receiving from the storage drive may be based around some of this overheating. If you’ve recently added new hardware to your computer, you may be overloading the power supply. And there may be not enough voltage to properly run the storage drive.

In those cases, you either want to remove the hardware so you have enough power for the drive or it’s time to upgrade that power supply. If you’re not sure if the problem is with the drive or something outside of the storage drive, you may want to run some hardware diagnostics of the drive itself. Normally, the drive manufacturer can provide you with a set of diagnostics that will check all of the working components of that drive and be able to tell you whether the drive is operating properly or if there’s any errors.

Another set of symptoms may occur when you boot your system. You could get a message that says, drive not recognized or boot device not found. You might get lights that show access to the drive or there may be no lights showing any access to the drive. There could be beeping messages or there may be detailed error messages on the screen that you could use to reference that particular problem.

Another symptom during startup might be one that says, operating system not found. This means the drive is there. But for some reason, we’re not able to find an operating system installed on that storage drive. To troubleshoot these types of problems, the first thing you should do is check the physical configuration. Do we have all the cables in place and are they properly attached to the drive into the motherboard.

Then, it might be useful to know exactly what the sequence is for booting up in the basic input output system or BIOS of that computer. The BIOS maintains the list of priorities for all of the boot devices during startup. So your BIOS will first check the storage drive that’s at the top of the list. If there’s no operating system to be found there, it checks the next one on the list, and so on.

You may also want to check the USB interfaces on your system and see if you happen to leave a USB storage device connected to one of those interfaces. If your BIOS is set up to boot from a USB interface before booting from the hard drive or SSD of your system, leaving a USB flash drive in one of those interfaces could cause the system not to boot. You might also want to check inside the BIOS that you haven’t accidentally disabled any storage devices which would certainly prevent the system from booting.

If you’re booting from a new storage drive, you’ll want to check the data and power cables to make sure that drive has physically been installed properly. And then if you have other SATA interfaces on your motherboard, you may want to try different interfaces to see if you can find one that may be operational. And ultimately, you might want to remove the SATA or M.2 drive from your system and try that drive in a different computer to see if it happens to work on another system.

One problem with hard drives is that they are spinning mechanical systems. So it’s not a question of if a hard drive will fail but when that hard drive will fail. If you don’t have a backup of the contents on that hard drive and the hard drive fails, you may have to send that drive to a very expensive recovery company to be able to retrieve all of that data. Although these third-party recovery companies are very good at what they do, there still might be times when the data on that drive is simply unrecoverable.

A solid state drive might also fail. And you would not be able to read or write any data to that drive. But some SSDs can fail and still allow you to read the data but not write any new data to the drive. If you’re using a storage drive that has any of these kinds of issues, you could inadvertently cause the data on that drive to be corrupted. So it’s always good to have constant backups of our storage drives so that you always know that you’ve got a copy of this data, even if the primary drive fails.

If you’re working with a server or some other large system that might be in a data center, there might not be one single drive inside of that device. Instead, there might be multiple drives that are connected through a RAID array. RAID is the Redundant Array of Independent Disks. And it’s a common way to combine drives together to maintain uptime and availability of your data. This boot-up process shows that there is an integrated RAID exception detected. And this volume is currently in the state of inactive.

And it gives you the option of starting the Dell configuration utility to investigate and get more information about this RAID controller failure. Or it might be that a single drive of that RAID array has failed and you need to either replace or update the drive in the system. Usually, there is a RAID manager that can list out all of the drives in the system, tell you what types of models those drives might be, and can give you a status of the performance of each of those individual drives in the RAID array.

If any of these drives had a failure or an error, it will be noted in the status. And you would know exactly which drive would need to be replaced. As you recall from our previous video on RAID, there are different RAID types that you could use in your system. If you’re running RAID 0, it requires two or more drives. But a single drive failure will break the array, and you will have data loss.

If you’re running RAID 1, you still need at least two drives in your system. And the array will continue to work as long as one of those drives is operational. And RAID 5– you need three or more drives. And you need all of the drives operational but one of them. And, lastly, RAID 10, which you may see written as RAID 1 plus 0, requires four or more drives. And you can lose all but one of the drives from each set of mirrors.

You may not realize it, but there is an amazing amount of diagnostic information inside of each storage drive using the standard of SMART. SMART stands for self-monitoring, analysis, and reporting technology. And although we don’t commonly see these statistics when we’re using our drives, we are able to access this information, either using third-party utilities or utilities from the drive manufacturer themselves.

Here’s an example of some of the SMART information on one of the drives in my RAID array. And you can see everything from spin-up time to seek error rates to power on hours and much more. If there are growing problems that are occurring on this drive, you’ll start to see the number of errors incrementing inside of this SMART information. These third-party utilities or your operating system itself may be able to monitor the SMART data and inform you when there might be problems that could indicate an issue with that storage device.

Mini RAID arrays will perform their own checks, either once a day, once a week, or once a month, and can give you an update on how well those drives are performing during that time. If you do receive a message that a drive happens to be failing, it’s time to back that data and replace that drive as soon as possible. For most of us, we want our storage drives to run at peak performance. But what does this peak performance mean as we’re reading or writing information from that drive?

There are so many different parts of your computer system that are used to retrieve or store that data. You have memory access. There’s communication across the bus, the drive itself– it’s a hard drive is physically spinning. And you may be writing or reading the data to different types of storage media. In every single one of those places, there could be delays or slowdowns associated with those processes. So we need some way to measure the overall performance of our storage devices.

One way to do that is to measure the number of input/output operations per second or IOPS. This is a very broad perspective of how well we’re able to read and write data to that drive. But at least it gives us a standard to use across multiple drives and multiple systems. If you’re performing any type of drive diagnostics, then it’s probably going to give you statistics in IOPS.

For example, if you’re using a hard drive, you’ll probably max out somewhere around 200 IOPS. That’s 200 input/output operations per second. If you compare that to a solid state drive, you can have performance up to one million IOPS. And usually, you’ll have different values for IOPS depending on whether you are reading or writing information to that storage drive. These IOPS numbers on their own may not provide you with a lot of detail. But as you start comparing different drives on different systems, you’ll see how the performance might change depending on what you’re using.

There may be times when you boot your computer but there are certain drives, that you’re expecting to see in the File Manager that simply are not listed. In those cases, we may want to check the BIOS, especially if the drive that’s missing happens to be physically located on our computer. If we don’t see the drive in the BIOS, then we might have loose or missing cables or we may need to reseat the M.2 drive on our motherboard. Or it may be that the storage drive itself has failed. We’ll need to replace the drive completely.

If this is an external drive that’s connected via USB, then we want to be sure we have power for the drive and that we’re connecting to the appropriate USB interface. And if the missing drive is a network share, then we may have missed the drive being mounted during the startup process. We may have the option to reconnect that drive once we make that connection. Or we may want to run the login script again that will go through the process of mapping the appropriate drives.