Two years ago I wrote a post on S.M.A.R.T., an acronym for “Self-Monitoring, Analysis, and Reporting Technology”. In plain English, I’m talking about technology inside a hard drive or SSD that monitors and reports about the health of the drive. While S.M.A.R.T. can give us clues as to the state of the drive, it’s not always an exact science. Some indicators warrant an immediate drive replacement, others can indicate a higher probability of imminent failure.
SSD or Solid State Drives
My previous post on Monitoring Hard Drives Using Smartmontools focused on monitoring mechanical hard drives and showed some examples on how to identify drives that are likely going to fail soon. But S.M.A.R.T. isn’t limited to mechanical hard drives, it works with SSD too.
Sylvain Leroux over at linuxhandbook.com has written an article on Monitoring and Testing the Health of SSD in Linux that explains how to read SSD drive statistics and evaluate the drive attributes. Not related to Linux but still worth looking at is Predicting SSD Failures: Specific S.M.A.R.T. Values.
While Solid State Drives are much less prone to mechanical failure (I can only think of excessive heat), they aren’t invincible. Some procedures used on mechanical drives, like disk defragmentation under Microsoft Windows, will actually reduce the life span of an SSD without improving performance. Windows 10 will at least recognize an SSD and prevent you from defragmentation. Thankfully on Linux it’s a non issue, as defragmentation isn’t needed even on mechanical drives.
But there are other factors that reduce the life of an SSD, aside from the underlying technology. For example lots of write cycles with large amounts of data will bring the SSD faster to its rated maximum number of writes.
In Linux, you can reduce the writes to disk by adding the noatime option to the entry in the /etc/fstab file:
UUID=whatever... / ext4 noatime,errors=remount-ro 0 1The noatime option prevents Linux from updating the “read time stamp” of files that have been accessed but not modified.
When you modify and save a file back to SSD, the data is written to a new storage area and the old copy of the file is marked for deletion. This and other “wear leveling” and “garbage collection” processes on an SSD require extra disk space to work (efficiently). Conversely, if you discover that your SSD is performing much slower than it used to, your SSD may be near its maximum capacity. Try to leave some unused (non-partitioned) storage space on your SSD (I use my SSD up to a capacity of 80-85%). For more details on how SSD work, see How Do SSDs Work?.
No matter what you do, a SSD may eventually fail. And when a SSD fails, it often happens suddenly without any warning. As reported in the article by Sylvain, 38% of the drives failed without showing the specified symptoms (S.M.A.R.T. values).
It gets worse: recovering data from a failing SSD is more challenging than recovering data from a hard drive. In some cases it’s just not possible.
The assumption that SSD don’t fail since they have no moving parts is not exactly correct. It’s true they don’t have moving parts, but they do contain sensitive electronics which can fail any time. For example, one potential source for trouble is the power supply of your computer (because of incorrect or unstable voltages, ripples, etc.). A power failure at the wrong moment might also render a SSD unusable.
Here some suggestions on how to prevent or minimize data loss on SSD:
- Make sure you backup all your valuable data regularly. Never rely on a SSD alone to save-keep your files.
- If uptime is important to you, create an image file of your boot disk, or better, have a copy of the system disk on another SSD.
- Be SMART and regularly check on the drive status. It might help you spot a failing SSD (or HDD). See Monitoring Hard Drives Using Smartmontools for how to install smartmontools on Linux (Ubuntu and derivatives).