Stephen Harrison B.Sc. (Hons), M.Sc., MBCS, CITP
Are you looking for a hard disk drive failure fix? Then you've come to the right place. In this article, we look at the different tell-tale signs that your disk is going to fail, and the underlying causes of hard disk failure.
Next, we look at how to test your hard disks for any issues, then review the options you have for both fixing and protecting yourself from HDD failures.
Just like other technology, mechanics and computer hardware components, hard drives are subject to a form of failure at sometime in their lifecycle.
There are over 280 million HDDs sold each year, with an observed failure rate of around 2%. This means over 5.5 million hard disks fail each year!
Drives fail for many reasons, and we will look at these later in the article. Traditional HDDs are mechanical and have spinning magnetic disks (called platters) and something called an actuator arm that moves the read/write heads as you use your device. There are lots of moving parts, and each of these can fail.
Even solid-state drives with their RAM boards are subject to failure, especially when they are heavily used due to their 'limited shelf-life'.
All computing devices require non-volatile disk or memory space to host your operating system, applications and data files (assuming you are not using a cloud service)
Non-volatile simply means your information is retained when your device is powered down, and is made available again when powered back up. This is different to volatile Random Access Memory, which does not retain loaded information when power is lost.
The free A to Z of performing BIOS updates guide that anyone can follow.
✔ A checklist of important actions to perform throughout the update process.
✔ Further supporting information to aid your update plans.
✔ My unique approach for sourcing the latest BIOS versions for any motherboard manufacturer.
Sign up below for instant access to the guide, or by going to the Subscription page for more details.
I never share information with third parties and your details are secure.
I aim to issue newsletters at the start of each month.
Contents
In my experience, hard drives don't generally fail outright one morning. There are usually certain signs that a failure is coming, even when those signs appear to be random.
One sign is when your computer is randomly freezing, and you cannot even move your cursor. This phenomenon is often accompanied by a permanent HDD light and an inability to open files or folders on a random basis. This occurrence can also be triggered by overheating, so worth performing the checks we look at later.
Another sign is the dreaded Blue Screen of Death (BSOD). I have seen this a few time with failing hard drives. The frequency steadily increases until your BIOS is unable to detect the drive's presence. Below is a typical BSOD event related to failing hard disk drives.
Another phenomenon of a failing disk is when your device randomly shuts down for no apparent reason. Again this could be subject to overheating but if your device is passing the Power On Self Test (POST) and device fans are clear of dust and debris, a failing HDD is likely the reason.
What I have found previously is random shutdowns are also accompanied by prolonged boot times, and pauses when the machine is in the boot process, and prolonged shut down sequences to the point where you need to manually power down the machine.
In addition, there will come a time when the BIOS fails to detect the hard drive, even when you have not been inside your computer's chassis recently.
Different BIOS' display different messages, but they all refer to the HDD if that is the suspected cause.
Another 'soft' error that occurs as a result of a failing hard drive is random prompts for formatting the C:\ Drive.
One of the most common signs of an impending failure is an audible clicking or clunking sound coming from the traditional HDD itself. This is known as the click of death, and is often caused by bad sectors, which we will look at in the next section.
The video below is an excellent example of the click of death.
Finally, it is worth mentioning that hard disks come with Self-Monitoring, Analysis and reporting Technology (SMART). This is intelligence built in to your disks to highlight issues and remedial action before your hard drive fails.
There are many error codes SMART can advise, but one to take note of, for example, is Status BAD. This means the disk is at risk of impending failure, and you need to do something about it ASAP.
Below is a quote from an excellent academic paper researching how to improve the prediction of HDD failures using machine learning. I have included in this article to highlight, that, although S.M.A.R.T is "smart", it is not perfect.
"Techniques [such] as the Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.) support the monitoring of the hardware. However, those techniques often lack algorithms for intelligent data analytics. Especially, the integration of machine learning to identify potential failures in advance seems to be promising to reduce administration overhead."
Citation
Züfle M., Krupitzer C., Erhard F., Grohmann J., Kounev S. (2020) To Fail or Not to Fail: Predicting Hard Disk Drive Failure Time Windows. In: Hermanns H. (eds) Measurement, Modelling and Evaluation of Computing Systems. MMB 2020. Lecture Notes in Computer Science, vol 12040. Springer, Cham.
Return to the Table of Contents
Overheating is a common cause of hard disk activity disruption, triggering system freezes, performance issues and unexpected shutdowns. Look at system ventilation and other underlying causes of heat increase, such as overlocked CPUs, for example.
Bad clusters or bad sectors are a sign that there is physical damage to the platters inside the hard drive. Data can often be recovered and moved to different areas, or clusters, of the disk.
However, it would be sensible at this stage to backup all of your data and replace the hard drive.
In addition, logical corruptions can also trigger hard drive issues, including boot problems.
For example, the bootrec /fixmbr command in Windows 10 can help repair the Master Boot Record (MBR) if the issue is related to the logical management of disk sectors or partitions.
The click of death is a mechanical failure where the read/write head can become faulty, or even stuck to the disk platters, which is known as "sticktion".
SMART will undertake several attempts to resolve the 'ticking' noise by resetting the read/write head.
However, if it is unable to do so before the reset time is exceeded, your device will not boot up. The short video below is effectively a simulation of this process.
Another mechanical issue that triggers a buzzing noise is a seized motor or bearing that prevents the disk platters from spinning correctly, or at all. A beeping noise can also be an indication of a motor issue.
Electrical surges can, albeit rarely, trigger burn out in the disk's Printed Circuit Board (PCB). If you smell a burning odour or see any smoke, switch off immediately and remove the hard drive.
A phenomenon called "adaptive drift" is where the manufacturer's calibration of the hard drive becomes out of sync with the hard drives baseline operating conditions.
Manufacturers include an "adaptive list" in the hard drive's firmware. This list accounts for changes in the disk's baseline conditions over time to ensure the disk continues to function.
These changes can include wear and tear on the head stack, or small magnetic flux changes to the drive's neodymium magnets (a component of a hard drive that makes it work).
This issue can be difficult to spot because to us, the hard drive is absolutely fine with none of the issues identified above present, but is simply not working properly any more.
This can be put down to simple wear and tear as time passes and the HDD is used.
Occasionally, poor quality components or manufacturing errors can cause failures. I have seen band new disks fail after 24 hours of use because of manufacturer or shipping issues, such as insufficient padding to survive the transit from the warehouse.
Human error is a frequent cause of disk failures. I have also seen first hand how a hard drive can be damaged by accidentally dropping portable hard drives, triggering the click of death.
Another cause is formatting HDDs too often, and especially if the entire hard drive capacity is being overwritten with 0s and 1s. This can increase wear and tear on the internal components.
Hard drives also do not like magnets. Keep them away!
Finally, it is worth mentioning malware as a source of HDD corruption. Viruses can affect the Master Boot Record and corrupt your operating system. Although this is not a physical failure, it can still prevent access to files and folders.
Return to the Table of Contents
In addition to the inbuilt SMART technology, there are tools and commands available to help diagnose and help resolve issues yourself.
There are several commands you can run in Microsoft Windows. For example, the "diskdrive get status" command below gives you an instant view on whether there are any known issues with the disk or not.
Tools such as the popular HDD Scan is excellent for inspecting SMART reports instantly, performing read/write tests, and monitoring elements such as the hard drives temperature.
HDD Scan can also perform various stress tests on your drive, such as read or read/write tests to identify block or sector issues.
Although it is reassuring when such tests return no issues, I would recommend you only perform such tests if you suspect an issue. Why put your hard drive through a series of tests unnecessarily, and exacerbate the wear and tear factor?
If you do suspect and issue, then an HDD Scan is as good a tool as any to use to identify or confirm your suspicions.
Return to the Table of Contents
The bad news here is that if your hard drive issues are due to a mechanical or other physical issue, it is likely you are at the stage where a new HDD is required.
Sure, you could send your drive off to a specialist company to fix and/or retrieve your data. However, the costs of hard disks are falling, and if you adopt good practice of regular backups and even cloud hosting for your personal information, I think you are better off simply replacing with new.
If you have a traditional desktop computer, you can check to ensure the SATA cabling or equivalent has not worked itself loose or failed. Don't forget to reseat both ends.
What you can do, however, is take preventative measures to prolong the life of your hard drive. Ensure your machine's ventilation is not reduced by dust or other blockages, as this will help prevent overheating and the issues that ensue.
If you experience overheating, try shutting down your computer, removing any blockages, then switching back on. Perform a temperature check to see if your efforts have made any difference.
Also, ensure your devices are protected against electrical surges and static electricity.
Microsoft Windows utilities such as Scan Disk, or commands such as CHKDSK can identify and move data from bad sectors. If you see this, replace your disks immediately rather than reformatting, which can appear to resolve bad sectors, but in my experience, tend to return.
For other logical issues such as corrupted MBR's, try running the commands below in Windows Repair Mode.
Return to the Table of Contents
Most hard drives fail within 10 years under normal use. How old are yours and how frequently have they been used?
I can't emphasize enough how important it is to backup your data and to consider using an offline storage solution such as cloud. The last thing you need is to lose your data when you encounter a hard drive fault.
If you have a traditional hard drive that is in need of replacement, consider a solid-state drive instead. They can speed up your device as they are faster than mechanical drives. See the SSD article for more details.
Finally, take a look at the Paesseler site which looks at different HDD error message and potential fixes.