Rescuing data from a dying SSD
A while ago I replaced my unsafe RAID 0 setup with a (faster) SSD of 1TB. Since the RAID 0 setup was unsafe, I created backups every now and then. Even when I replaced the unsafe setup with the safer SSD, I continued creating backups. I try to adhere to the 3-2-1 backup strategy. So I have 3 (actually more than 3) copies of my data on 2 different media. And 1 of them kept off-site. I still use my old script for this that relies on rsync to copy all the data to whatever destination (external drive, NFS share, ...) that I can mount.
During the last run of the script I started to notice failures during the rsync copy. After some investigation and booting the PC with a rescue linux, errors started popping
up while running fsck.ext4 -c. This option causes e2fsck to use badblocks program to do a read-only scan of the device in order to find any bad blocks.
The results of smartclt did not look any good either :(
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-5.10.197-1-MANJARO] (local build) Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 064 064 010 Pre-fail Always - 407 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 9464 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 47 177 Wear_Leveling_Count 0x0013 099 099 000 Pre-fail Always - 7 179 Used_Rsvd_Blk_Cnt_Tot 0x0013 064 064 010 Pre-fail Always - 407 181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0 183 Runtime_Bad_Block 0x0013 064 064 010 Pre-fail Always - 407 187 Uncorrectable_Error_Cnt 0x0032 094 094 000 Old_age Always - 51708 190 Airflow_Temperature_Cel 0x0032 076 054 000 Old_age Always - 24 195 ECC_Error_Rate 0x001a 199 199 000 Old_age Always - 51708 199 CRC_Error_Count 0x003e 099 099 000 Old_age Always - 1 235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 3 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 4777709067
So the disk was/is dying.
Lukily, this month there was a very extensive article in my favourite computer magazine describing all the necessary steps to recover from this.
Step 1: copy all the data to a new disk
After booting the system with the aforementioned rescue linux, issue the following command: ddrescue -Of /dev/sda /dev/sdi /mnt/disk.log (replace /dev/sda with the name of the disk that is faling and /dev/sdi with the new disk). disk.log will contain some cryptic data that describes where, for example, read errors occcured. We will need this file in the next step, so you'll want to save this somewhere. In my case; yet another USB-stick.
This took the larger part of a day, but of course your mileage may vary. In fact, I read posts on the Internet describing
ddrescue running for days, even a week. The good news was I had managed to rescue 99.99% of the data, which is not so bad, I think.
So how to find the data or at least the names of the files that are damaged beyond repair?
Step 2: inspect the log file created by ddrescue with ddru_findbad
Now everything is safely copied (after a few tries and passes) to the new disk, it was time to reboot the system with the failing disk and inspecting the log created by ddrescue. You'll need ddrutility for this. Once installed execute the following command: ddru_findbad /dev/sdi /mnt/disk.log. Again, replace /dev/sdi with the new, good disk.
This will also take some time to run and will generate a results_list.txt file containing entries like depicted below:
Partition /dev/sdi3 Type Linux DeviceSector 682041040 PartitionSector 476190416 Block 59523802 Allocated yes Inode 14680670 File /dude/Photos/2011/09/24/IMG_4621.JPG Partition /dev/sdi3 Type Linux DeviceSector 682047032 PartitionSector 476196408 Block 59524551 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682047033 PartitionSector 476196409 Block 59524551 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682047034 PartitionSector 476196410 Block 59524551 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682047035 PartitionSector 476196411 Block 59524551 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682047036 PartitionSector 476196412 Block 59524551 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682047039 PartitionSector 476196415 Block 59524551 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682053496 PartitionSector 476202872 Block 59525359 Allocated yes Inode 14680672 File /dude/Photos/2011/09/24/IMG_4559.JPG Partition /dev/sdi3 Type Linux DeviceSector 682053503 PartitionSector 476202879 Block 59525359 Allocated yes Inode 14680672 File /dude/Photos/2011/09/24/IMG_4559.JPG Partition /dev/sdi3 Type Linux DeviceSector 682055432 PartitionSector 476204808 Block 59525601 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682055433 PartitionSector 476204809 Block 59525601 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682055434 PartitionSector 476204810 Block 59525601 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682055435 PartitionSector 476204811 Block 59525601 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682055436 PartitionSector 476204812 Block 59525601 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682055437 PartitionSector 476204813 Block 59525601 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682055438 PartitionSector 476204814 Block 59525601 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 682055439 PartitionSector 476204815 Block 59525601 Allocated yes Inode metadata File none Partition /dev/sdi3 Type Linux DeviceSector 736174952 PartitionSector 530324328 Block 66290541 Allocated yes Inode 16384451 File /dude/Photos/2014/10/07/IMG_2780.JPG Partition /dev/sdi3 Type Linux DeviceSector 736174953 PartitionSector 530324329 Block 66290541 Allocated yes Inode 16384451 File /dude/Photos/2014/10/07/IMG_2780.JPG Partition /dev/sdi3 Type Linux DeviceSector 736174954 PartitionSector 530324330 Block 66290541 Allocated yes Inode 16384451 File /dude/Photos/2014/10/07/IMG_2780.JPG Partition /dev/sdi3 Type Linux DeviceSector 736174955 PartitionSector 530324331 Block 66290541 Allocated yes Inode 16384451 File /dude/Photos/2014/10/07/IMG_2780.JPG Partition /dev/sdi3 Type Linux DeviceSector 736174959 PartitionSector 530324335 Block 66290541 Allocated yes Inode 16384451 File /dude/Photos/2014/10/07/IMG_2780.JPG Partition /dev/sdi3 Type Linux DeviceSector 736180968 PartitionSector 530330344 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG Partition /dev/sdi3 Type Linux DeviceSector 736180969 PartitionSector 530330345 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG Partition /dev/sdi3 Type Linux DeviceSector 736180970 PartitionSector 530330346 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG Partition /dev/sdi3 Type Linux DeviceSector 736180971 PartitionSector 530330347 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG Partition /dev/sdi3 Type Linux DeviceSector 736180972 PartitionSector 530330348 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG Partition /dev/sdi3 Type Linux DeviceSector 736180973 PartitionSector 530330349 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG Partition /dev/sdi3 Type Linux DeviceSector 736180974 PartitionSector 530330350 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG Partition /dev/sdi3 Type Linux DeviceSector 736180975 PartitionSector 530330351 Block 66291293 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG Partition /dev/sdi3 Type Linux DeviceSector 736181224 PartitionSector 530330600 Block 66291325 Allocated yes Inode 16384452 File /dude/Photos/2014/10/07/IMG_2781.JPG
I then asked chatgpt to generate a nicer list from this file and used the following script to parse the results in python:
file_path = 'results_list.txt' # Replace this with the actual path to your file with open(file_path, 'r') as file: file_content = file.read() # Split the content into lines lines = file_content.strip().split('\n') # Extract lines containing 'dude' dude_lines = [line for line in lines if 'yes' in line] # Extract unique file paths unique_file_paths = list(set(line.split('File ')[1] for line in dude_lines)) # Sort the unique file paths unique_file_paths = sorted(unique_file_paths) # Print the unique file paths for file_path in unique_file_paths: print(file_path)
The result is a nice list of files we will need to restore from backup:
/dude/Music/relaxmuziek/cd2.16.mp3 /dude/Photos/2011/09/24/IMG_4559.JPG /dude/Photos/2011/09/24/IMG_4621.JPG /dude/Photos/2012/11/03/IMG_6449.JPG /dude/Photos/2013/09/29/IMG_7232.JPG /dude/Photos/2013/09/29/IMG_7235.JPG /dude/Photos/2013/09/29/IMG_7258.JPG /dude/Photos/2013/09/29/IMG_7279.JPG /dude/Photos/2013/09/29/IMG_7280.JPG /dude/Photos/2013/09/29/IMG_7300.JPG /dude/Photos/2013/10/10/IMG_8583.JPG /dude/Photos/2013/10/10/IMG_8591.JPG /dude/Photos/2013/10/10/IMG_8611.JPG /dude/Photos/2013/10/10/IMG_8621.JPG /dude/Photos/2014/10/07/IMG_2780.JPG /dude/Photos/2014/10/07/IMG_2781.JPG /dude/Photos/2014/10/07/IMG_2783.JPG /dude/Photos/2014/10/07/IMG_2787.JPG /dude/Photos/2014/10/07/IMG_2788.JPG /dude/Videos/KERSTMIS 2012 019.MOV
Step 3: restore corrupt data from backup
So now that we know which files got corrupted beyond repair, we can restore (you do have a backup, do you?) the files from backup.
Step 4: replace the dying disk with the new disk
After all corrupt data was restored, I replaced the dying disk with the new disk in the desktop. I then used ShredOS to completely whipe the dying disk. Since it was still under warranty I was hoping to return it for repairs ... or get a replacement disk. The strange thing is, after this operation, the disk was showing healthy in GSmartControl (go figure). So not sure what to do with it now ... . Anyway, the important thing is I didn't lose any data and my desktop is happy again.
Comments