Sudden Death on SSD Drives

0
202
SSD Drives

In the section where we talk about technical information about SSDs and SSD recovery, we try to answer the question of how long an SSD lasts. We talk about the drawbacks of SSDs, which are degradation and data retention. While all these problems usually appear in the long term, after using the SSD for a long time or intensively, what happens when an SSD dies? And how does an SSD die? Why is there talk of sudden death in SSD disks? We explain how to know if your SSD drive is dead and how we can make possibilities for SSD recovery.

SSD Problems

Because of the way data is written to SSD memory chips, each cell undergoes wear and tear with each write. Thus, after a number of writes, that cell cannot be written (change its state, or “programmed”) more times, making it unusable.

The durability in SSDs is calculated by the TBs written (TBW) that they hold. This information is provided by the manufacturer and should never be taken as a guarantee that the disk will not damage before having written that amount of data in its memory.

On the other hand, NAND memory cannot save data indefinitely if it does not receive electrical current from time to time, since the small electrical charges that make up our data are lost over time. Very slowly, yes, but it is downloading. The more charges (bits) we put in the same cell (MLC, TLC, QLC …), the smaller these charges are, which will cause them to be lost even more quickly.

Why does an SSD Die?

To understand why an SSD dies, we first have to explain how the drive works. The SSD controller uses a portion of the SSD memory to store data that it needs to function. These data are for example the firmware (which would be the disk’s operating system, allowing you to control everything that happens on it), SMART data, address translation tables (FTL), encryption keys disk… This data necessary for the system to work occupies between 4 and 12 GBs.

We already explained that the controller moves the data from one cell to another on the SSD to prevent the same cells from always wearing out. This is called Wear Leveling. The problem is that the system space that the controller requires functioning cannot be moved to other cells, they are always the same. It is therefore evident that those cells where the system data is found wear out more than the rest of the cells on the disk. This is especially important when it comes to DRAM-less SSD drives.

So, it is to be expected that when any of the cells that contain the data that the controller needs to operate the disk, go bad due to having undergone too many writes or erase operations, the data will be corrupted. In this way, the controller will not be able to make the disk work and it will be unusable, losing all the data that it had in its memories.

Excess Writes to an SSD Memory Cell

From the previous paragraphs we can deduce that, when a memory cell of the SSD fails due to having carried out too many writing or erasing operations on it, the entire disk becomes unusable. This is the case if the cell contains the system data that the controller needs to function since we have already said that these blocks cannot be moved to others. But the blocks that contain user data can be moved to other cells (and in fact, they do so constantly due to wear leveling).

SSD disks incorporate extra memory space, made up of several million cells and not visible to the user. This set of extra cells is called over-provisioning, and its sole purpose is to serve as a replacement for user-space memory cells that may become corrupted in the future. Thus, when the controller detects that a cell is close to failing, or if it has already failed, what the controller does is move the data from that cell to a new cell that is in that extra space. Then it updates the TFL table to know that that data has now been moved so that it can be found in the future, and marks the cell that has failed as bad, so as not to write to it again.

This increases the life of the SSD, as several million cells can fail and the disk will continue to function. The problem is that, although there are many, the extra cells are still finite, so if there are many failures in normal cells and these extra cells run out, the disk will die. Although it is not usual, some discs leave the data in read-only mode when this happens, so that you can access the data to copy it, but you cannot write new data to the disc.

The Computer does not Detect the SSD, is it Dead?

On any given day we turn on the computer and the operating system does not load. Window crashes during startup and we cannot use the computer. Has the SSD failed? How to know if my SSD drive is dead?

If the SSD fails (dies), our data will be lost and the PC will not boot. If this happens, we can do several things to confirm if it is indeed the SSD hard drive that has failed:

  • Access the BIOS of the computer and see if it detects our hard drive
  • Check the data and power cables of the disk and verify that they are all well-connected
  • Replace the disk’s SATA data cable with another
  • Try another drive power cord
  • Connect the SATA data cable from the disk to another SATA port on the board
  • Remove the disk from the computer and connect it to another PC
  • Connect the disk using an external USB enclosure to another computer

If none of these things work, your hard drive is more than likely dead and you need to buy a new one. You can go through the section of recommended SSD hard drives to see a comparison table with the best drives of the moment.

To prevent sudden SSD death from causing problems, the only solution is to always have automated backups of all your important data.