Our storage devices provide access to our most important documents. In this video, you’ll learn how to troubleshoot disk failures and boot issues with standalone hard drives and RAID arrays.
<< Previous Video: Troubleshooting Common Hardware Problems Next: Troubleshooting Video and Display Issues >>
If you’re having problems with a hard drive, you might see different types of error messages occur. If you see a message that says “Cannot read from the source disk,” then this drive may be having a problem reading or writing information to that drive.
Or you may find that the overall performance to the drive is very slow. You might have slowness when reading or writing to the drive, and you might see constant LED lights that are flashing on that hard drive as you’re trying to read and write information to the drive. When that’s occurring, you might also hear a clicking noise. This is sometimes called “the click of death,” as the drive is trying to read access information on that drive.
All of these issues are significant because they show that you may be losing access to the data that is on this particular storage device. If any of this is happening, then the first thing you should do is to get a backup of all of that data. If this drive does happen to fail, you want to be sure that you at least have a copy of all of your important information.
Then you want to look for any loose cables or any damaged cables that might be inside of your computer. Maybe the problem is simply the cable connection and nothing related to the overall operation of the drive. Also check to make sure there’s no overheating inside of your system, especially if these drive problems tend to manifest themselves after your computer has been running for a while.
You also want to check your power supply and make sure that your storage device is able to get enough voltage. This can be especially important if you happen to have installed new hardware inside of your computer, and you might have a power supply that’s not able to power all of those devices at the same time.
And at the very least, you should run some hard drive diagnostics. Most manufacturers of hard drives will provide you with diagnostic software that’s specific to that drive, and you want to run that test from a known-good computer.
Sometimes the problems you’re having with your hard drive are occurring during the boot process. You might see lights or activity on the drive, or there may be no lights that are showing for hard drive access. There might be beeping messages that occur during the power on self-test, or there might be error messages on the screen related to your storage device.
You might also see messages related to the operating system. For example, “Operating system not found,” which means the hard drive is there, but there’s no operating system found on that hard drive.
One of the first checks that you can perform when you’re having these types of boot failures is the connection to the drive itself. You may want to disconnect and reconnect those particular data cables and power cables going to the drive. Also check the boot sequence in your BIOS. You may find that you’re trying to boot from a device that doesn’t have an operating system, and you’re not booting from the hard drive. You can change the boot order inside of your BIOS to choose one device over another.
You might also want to see if there are any removable disks or USB-connected disks on your computer, and that the BIOS isn’t choosing one of those disks before booting from the internal hard drive on your computer.
And of course, it’s certainly possible that storage interfaces are disabled in your BIOS. So if you’ve connected a new hard drive, you want to make sure your BIOS is able to see that drive.
If this is a brand new hard drive installation, you may want to check the cables or replace the cables for your data and the power. You might also want to try a different set of interfaces to make sure your problem isn’t associated with the interface on the motherboard.
And lastly, you may want to remove the drive completely from your computer and try it in a different PC to if the problem may be isolated to your system.
If you’re in a server environment, there may be multiple drives inside of your computer running in a RAID array. That’s a redundant array of inexpensive disks or a redundant array of independent disks, depending on which definition you like to use.
You might see messages appear on the screen that might identify a problem. This one is showing that the integrated RAID exception was detected. A volume on that RAID array is in an inactive or optimal state. And it gives you an option to enter the configuration utility to get more information about that error message.
When you start working with RAID arrays, that means that you’re going to have multiple drives that you could pull from. Some of these RAID arrays are very, very large. This is a RAID array of 12 different disks. Some are on one volume, some are on a different volume. You need to always check exactly what drive is in your system, and which one happens to be having a problem before you start pulling drives out of that RAID array.
The RAID Management software will usually tell you exactly which drive is having a problem and which one should be replaced.
Once you replace the drive, you may have to restore the data from a known-good backup or the RAID array may be able to restore itself. That will depend on which type of RAID is being used.
For example, for RAID 0, a single drive failure will break the entire array and you will have data loss. So if you replace a RAID 0 drive, you’ll need to restore from tape.
The other RAID formats– RAID 1, RAID 5, and RAID 10, which is sometimes called RAID 1 plus 0– will all be operational if you lose a single drive. And if you replace that drive, the RAID array will recognize that it’s a new drive, and it will begin rebuilding the data on that drive without having to restore from backup.
An error that can sometimes indicate a problem with a hard drive is a Windows Stop error, which is the blue screen of death, or the Apple spinning wait cursor. If these errors indicate that there is a storage device problem, then we want to be sure we have a good backup and perform some hard drive diagnostics on those storage devices.
Many drives will tell you when problems start to occur using a technology called SMART. SMART stands for Self-Monitoring, Analysis, and Reporting Technology. These SMART metrics are being calculated inside of every single drive that you have, and you can see there are a number of different attributes that are being gathered.
There are a number of different third-party utilities and utilities that are built into operating systems that allow you to get more information about these SMART errors. And you might be able to see spikes or changes in the operation of your hard drive, and be able to replace it before you lose any data.
If these drives are in a drive array, there are usually scheduled disk checks that occur every month, and there’s constant monitoring of these smart statistics. This allows the system administrator to get proactive information of when a drive may be going bad, so they can schedule a time to get the drive replaced and make sure that everything continues to run without any user downtime.