Rescuing data from a dying SSD

A while ago I replaced my unsafe RAID 0 setup with a (faster) SSD of 1TB. Since the RAID 0 setup was unsafe, I created backups every now and then. Even when I replaced the unsafe setup with the safer SSD, I continued creating backups. I try to adhere to the 3-2-1 backup strategy. So I have 3 (actually more than 3) copies of my data on 2 different media. And 1 of them kept off-site. I still use my old script for this that relies on rsync to copy all the data to whatever destination (external drive, NFS share, ...) that I can mount.

During the last run of the script I started to notice failures during the rsync copy. After some investigation and booting the PC with a rescue linux, errors started popping up while running fsck.ext4 -c. This option causes e2fsck to use badblocks program to do a read-only scan of the device in order to find any bad blocks.
The results of smartclt did not look any good either :(

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-5.10.197-1-MANJARO] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   064   064   010    Pre-fail  Always       -       407
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       9464
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       47
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       7
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   064   064   010    Pre-fail  Always       -       407
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   064   064   010    Pre-fail  Always       -       407
187 Uncorrectable_Error_Cnt 0x0032   094   094   000    Old_age   Always       -       51708
190 Airflow_Temperature_Cel 0x0032   076   054   000    Old_age   Always       -       24
195 ECC_Error_Rate          0x001a   199   199   000    Old_age   Always       -       51708
199 CRC_Error_Count         0x003e   099   099   000    Old_age   Always       -       1
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       3
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       4777709067

So the disk was/is dying.

Lukily, this month there was a very extensive article in my favourite computer magazine describing all the necessary steps to recover from this.

Step 1: copy all the data to a new disk

After booting the system with the aforementioned rescue linux, issue the following command: ddrescue -Of /dev/sda /dev/sdi /mnt/disk.log (replace /dev/sda with the name of the disk that is faling and /dev/sdi with the new disk). disk.log will contain some cryptic data that describes where, for example, read errors occcured. We will need this file in the next step, so you'll want to save this somewhere. In my case; yet another USB-stick.

This took the larger part of a day, but of course your mileage may vary. In fact, I read posts on the Internet describing ddrescue running for days, even a week. The good news was I had managed to rescue 99.99% of the data, which is not so bad, I think.
So how to find the data or at least the names of the files that are damaged beyond repair?

Step 2: inspect the log file created by ddrescue with ddru_findbad

Now everything is safely copied (after a few tries and passes) to the new disk, it was time to reboot the system with the failing disk and inspecting the log created by ddrescue. You'll need ddrutility for this. Once installed execute the following command: ddru_findbad /dev/sdi /mnt/disk.log. Again, replace /dev/sdi with the new, good disk.

This will also take some time to run and will generate a results_list.txt file containing entries like depicted below:

Partition /dev/sdi3 Type Linux DeviceSector 682041040 PartitionSector 476190416 Block 59523802 Allocated yes Inode 14680670 File /dude/Photos/2011/09/24/IMG_4621.JPG
Partition /dev/sdi3 Type Linux DeviceSector 682047032 PartitionSector 476196408 Block 59524551 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682047033 PartitionSector 476196409 Block 59524551 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682047034 PartitionSector 476196410 Block 59524551 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682047035 PartitionSector 476196411 Block 59524551 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682047036 PartitionSector 476196412 Block 59524551 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682047039 PartitionSector 476196415 Block 59524551 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682053496 PartitionSector 476202872 Block 59525359 Allocated yes Inode 14680672 File /dude/Photos/2011/09/24/IMG_4559.JPG
Partition /dev/sdi3 Type Linux DeviceSector 682053503 PartitionSector 476202879 Block 59525359 Allocated yes Inode 14680672 File /dude/Photos/2011/09/24/IMG_4559.JPG
Partition /dev/sdi3 Type Linux DeviceSector 682055432 PartitionSector 476204808 Block 59525601 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682055433 PartitionSector 476204809 Block 59525601 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682055434 PartitionSector 476204810 Block 59525601 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682055435 PartitionSector 476204811 Block 59525601 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682055436 PartitionSector 476204812 Block 59525601 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682055437 PartitionSector 476204813 Block 59525601 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682055438 PartitionSector 476204814 Block 59525601 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 682055439 PartitionSector 476204815 Block 59525601 Allocated yes Inode metadata File none
Partition /dev/sdi3 Type Linux DeviceSector 736174952 PartitionSector 530324328 Block 66290541 Allocated yes Inode 16384451 File /dude/Photos/2014/10/07/IMG_2780.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736174953 PartitionSector 530324329 Block 66290541 Allocated yes Inode 16384451 File /dude/Photos/2014/10/07/IMG_2780.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736174954 PartitionSector 530324330 Block 66290541 Allocated yes Inode 16384451 File /dude/Photos/2014/10/07/IMG_2780.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736174955 PartitionSector 530324331 Block 66290541 Allocated yes Inode 16384451 File /dude/Photos/2014/10/07/IMG_2780.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736174959 PartitionSector 530324335 Block 66290541 Allocated yes Inode 16384451 File /dude/Photos/2014/10/07/IMG_2780.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736180968 PartitionSector 530330344 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736180969 PartitionSector 530330345 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736180970 PartitionSector 530330346 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736180971 PartitionSector 530330347 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736180972 PartitionSector 530330348 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736180973 PartitionSector 530330349 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736180974 PartitionSector 530330350 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736180975 PartitionSector 530330351 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG
Partition /dev/sdi3 Type Linux DeviceSector 736181224 PartitionSector 530330600 Block 66291325 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG

I then asked chatgpt to generate a nicer list from this file and used the following script to parse the results in python:

file_path = 'results_list.txt'  # Replace this with the actual path to your file

with open(file_path, 'r') as file:
    file_content = file.read()

# Split the content into lines
lines = file_content.strip().split('\n')

# Extract lines containing 'dude'
dude_lines = [line for line in lines if 'yes' in line]

# Extract unique file paths
unique_file_paths = list(set(line.split('File ')[1] for line in dude_lines))

# Sort the unique file paths
unique_file_paths = sorted(unique_file_paths)

# Print the unique file paths
for file_path in unique_file_paths:
    print(file_path)

The result is a nice list of files we will need to restore from backup:

/dude/Music/relaxmuziek/cd2.16.mp3
/dude/Photos/2011/09/24/IMG_4559.JPG
/dude/Photos/2011/09/24/IMG_4621.JPG
/dude/Photos/2012/11/03/IMG_6449.JPG
/dude/Photos/2013/09/29/IMG_7232.JPG
/dude/Photos/2013/09/29/IMG_7235.JPG
/dude/Photos/2013/09/29/IMG_7258.JPG
/dude/Photos/2013/09/29/IMG_7279.JPG
/dude/Photos/2013/09/29/IMG_7280.JPG
/dude/Photos/2013/09/29/IMG_7300.JPG
/dude/Photos/2013/10/10/IMG_8583.JPG
/dude/Photos/2013/10/10/IMG_8591.JPG
/dude/Photos/2013/10/10/IMG_8611.JPG
/dude/Photos/2013/10/10/IMG_8621.JPG
/dude/Photos/2014/10/07/IMG_2780.JPG
/dude/Photos/2014/10/07/IMG_2781.JPG
/dude/Photos/2014/10/07/IMG_2783.JPG
/dude/Photos/2014/10/07/IMG_2787.JPG
/dude/Photos/2014/10/07/IMG_2788.JPG
/dude/Videos/KERSTMIS 2012 019.MOV

Step 3: restore corrupt data from backup

So now that we know which files got corrupted beyond repair, we can restore (you do have a backup, do you?) the files from backup.

Step 4: replace the dying disk with the new disk

After all corrupt data was restored, I replaced the dying disk with the new disk in the desktop. I then used ShredOS to completely whipe the dying disk. Since it was still under warranty I was hoping to return it for repairs ... or get a replacement disk. The strange thing is, after this operation, the disk was showing healthy in GSmartControl (go figure). So not sure what to do with it now ... . Anyway, the important thing is I didn't lose any data and my desktop is happy again.

Comments

Popular posts from this blog

Remove copy protection from PDF documents

The story of the Cobalt Qube

The end of vinyl records