Jump to content

Two drives failed in my array. 

I order news only did zpool replace. 

 

Now the all the other drives read CKSUM errors. 

 

1 pool 2 vdevs of 12 drives in raidz2. 

 

Yes both failed drives in 1 vdev. 

 

I want kill myself right now. Whats wrong what did I do wrong. 

 

Please help. 

 

Don't comment please if you are only going to say yeah you are screwed. 

 

Give me something to try first. .

 

Please please. please 

 

Restart? 

Diff command ? 

 

PAC with a demon? 

 

My mental state is fragile right now and this server was my world of zen. Please oh please. 

Link to comment
https://linustechtips.com/topic/1362539-help-zfs-suicide/
Share on other sites

Link to post
Share on other sites

Im guessing you don't have backups?

 

Can you show zpool status -v?

 

Can you fix those drives? You can often make a image of a failing drive with ddrescure?

 

What hardware are you using? Do you have enough hardware to make a image of all drives?

 

How important is this data? If its impotant, send it off to data recovery and don't touch anything.

Link to comment
https://linustechtips.com/topic/1362539-help-zfs-suicide/#findComment-14914110
Share on other sites

Link to post
Share on other sites

26 minutes ago, Electronics Wizardy said:

Im guessing you don't have backups?

 

Can you show zpool status -v?

 

Can you fix those drives? You can often make a image of a failing drive with ddrescure?

 

What hardware are you using? Do you have enough hardware to make a image of all drives?

 

How important is this data? If its impotant, send it off to data recovery and don't touch anything.

I know a guy who used to claim that the moral of every story ever told ever was “always make backups”

 

I have committed ZFS suicide (or ZFS murder?) but it was long ago.  It’s a complicated and feature filled system and I tried early adopting to my chagrin. The drives I was using couldn’t handle it.  Less likely to be a problem now.  It has some wacky requirements and behaviors the details of which I have long forgotten I’m afraid. Might need a lot of specific details about what was done.  I remember a long row of checkboxes but that was an earlier time.  It’s a system that is quite powerful and one of the difficulties of powerful systems is it’s easier to screw up and break things.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
https://linustechtips.com/topic/1362539-help-zfs-suicide/#findComment-14914146
Share on other sites

Link to post
Share on other sites

39 minutes ago, Electronics Wizardy said:

Can you show zpool status -v?

 

This is the first step

Can you also run a dmesg ? 

 

 

As for hardware, what is this server? does it have an HBA/Backplane at all?

 

 

Checksum errors are probably just because checksum failed when the drives failed.

With raidz2 your data should still be available, or has your share gone down? 

The Checksum errors can be cleared when the issue is resolved (using zpool clear, but this should only be done once everything is resolved as it will initiate a scrub as well). 

Spoiler

Desktop: Ryzen9 5950X | ASUS ROG Crosshair VIII Hero (Wifi) | EVGA RTX 3080Ti FTW3 | 32GB (2x16GB) Corsair Dominator Platinum RGB Pro 3600Mhz | EKWB EK-AIO 360D-RGB | EKWB EK-Vardar RGB Fans | 1TB Samsung 980 Pro, 4TB Samsung 980 Pro | Corsair 5000D Airflow | Corsair HX850 Platinum PSU | Asus ROG 42" OLED PG42UQ + LG 32" 32GK850G Monitor | Roccat Vulcan TKL Pro Keyboard | Logitech G Pro X Superlight  | MicroLab Solo 7C Speakers | Audio-Technica ATH-M50xBT2 LE Headphones | TC-Helicon GoXLR | Audio-Technica AT2035 | LTT Desk Mat | XBOX-X Controller | Windows 11 Pro

 

Spoiler

Server: Fractal Design Define R6 | Ryzen 3950x | ASRock X570 Taichi | Asus RTX 4060 Dual OC | 64GB (4x16GB) Corsair Vengeance LPX 3000Mhz | Corsair RM850v2 PSU | Fractal S36 Triple AIO + 4 Additional Venturi 120mm Fans | 8 x 20TB Seagate Exos X22 | 4 x 16TB Seagate Exos X18 | 3 x 2TB Samsung 970 Evo Plus NVMe | LSI 9211-8i HBA

 

Spoiler

NAS: Innovision 4U 24-bay chassis (12GB MiniHD SGIO Backplane) | Intel Core i9-10980xe | EVGA X299 FTW-K | EVGA RTX 2080Ti Super FTW3 | 128GB (8x16GB) Corsair Vengeance LPX 3200Mhz | DEEPCOOL PN1000M PSU| Noctua NH-D12L Chromax Black | 16 x 16TB Seagate Exos X18 | 2 x 2TB Samsung 990 Pro | 2 x 2TB Intel U.2 P4510 | LSI 9305-24i HBA

 

Link to comment
https://linustechtips.com/topic/1362539-help-zfs-suicide/#findComment-14914174
Share on other sites

Link to post
Share on other sites

On 8/5/2021 at 3:31 PM, Jarsky said:

 

This is the first step

Can you also run a dmesg ? 

 

 

As for hardware, what is this server? does it have an HBA/Backplane at all?

 

 

Checksum errors are probably just because checksum failed when the drives failed.

With raidz2 your data should still be available, or has your share gone down? 

The Checksum errors can be cleared when the issue is resolved (using zpool clear, but this should only be done once everything is resolved as it will initiate a scrub as well). 

So 1 drive failed to the point of listing as unavailable
Another drive degrade to to lots of read errors . 
A 3 drive is degraded with some read errors. 

I freaked and replaced 2 drive at once. 

During silvering. 

It gets to about 25% then all drives in the vdev list CHKSUM errors and the resilver restarts.(The other vdev is unaffected)

Can I stop one of the replacements and put the degrade drive back and do the replace one. 

Yes I know Raid is redundancy not back ups I am looking for a back solution. Make tape drive server at my parents house or something. 

Can you stop a disk replacement?

FYI all files were  accessible with the 1 fail drive and degraded other one. Once I tried double replacement files got dodgey. 
 

Link to comment
https://linustechtips.com/topic/1362539-help-zfs-suicide/#findComment-14918278
Share on other sites

Link to post
Share on other sites

On 8/5/2021 at 3:13 PM, Bombastinator said:

I know a guy who used to claim that the moral of every story ever told ever was “always make backups”

 

I have committed ZFS suicide (or ZFS murder?) but it was long ago.  It’s a complicated and feature filled system and I tried early adopting to my chagrin. The drives I was using couldn’t handle it.  Less likely to be a problem now.  It has some wacky requirements and behaviors the details of which I have long forgotten I’m afraid. Might need a lot of specific details about what was done.  I remember a long row of checkboxes but that was an earlier time.  It’s a system that is quite powerful and one of the difficulties of powerful systems is it’s easier to screw up and break things.

I have one drive that is still readable if I can copy the image to the new drive and replace it that way the pool should tell the  difference right?

Link to comment
https://linustechtips.com/topic/1362539-help-zfs-suicide/#findComment-14918301
Share on other sites

Link to post
Share on other sites

1 hour ago, Bmoney said:

I have one drive that is still readable if I can copy the image to the new drive and replace it that way the pool should tell the  difference right?

I am the wrong person to ask.  This whole multiple failing drives thing is scary. Back in the day HDDs failed fairly slowly but it accelerated quickly.  The move was to replace a drive that even started to show problems, because those problems wouldn’t go away and would get worse fast. I left a bad drive for a month once and wound up only being saved by the freezer trick which was an unreliable last ditch move to save data before a drive died.   I very much doubt the freezer trick is even a thing anymore.  After that I was quite conservative about it and thus never wound up in the state you seem to be in.  My move would be to not even try to save whole drives.  Get the critical unrebuildable data off, and accept that stuff like system files are just a loss. There seems to be something within the setup of the drives that is causing problems and trying to save whole drives may actually be compounding and perpetuating the issue because the thing that is killing the drives is being saved too.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
https://linustechtips.com/topic/1362539-help-zfs-suicide/#findComment-14918431
Share on other sites

Link to post
Share on other sites

On 8/5/2021 at 2:42 PM, Electronics Wizardy said:

Im guessing you don't have backups?

 

Can you show zpool status -v?

 

Can you fix those drives? You can often make a image of a failing drive with ddrescure?

 

What hardware are you using? Do you have enough hardware to make a image of all drives?

 

How important is this data? If its impotant, send it off to data recovery and don't touch anything.

So finished a resilver. 
About 500 corrupted files. 
Luckly most of them were from my Torrent Working folder so losing something that wasn't there to start with. 

But It says it is still replacing 2 disks. 

Deleted all problem files. 

How do I get the old disks that are removed from the machine to go offline.  

Link to comment
https://linustechtips.com/topic/1362539-help-zfs-suicide/#findComment-14923695
Share on other sites

Link to post
Share on other sites

2 hours ago, Bmoney said:

So finished a resilver. 
About 500 corrupted files. 
Luckly most of them were from my Torrent Working folder so losing something that wasn't there to start with. 

But It says it is still replacing 2 disks. 

Deleted all problem files. 

How do I get the old disks that are removed from the machine to go offline.  

Can you show zpool status?

 

You should be able to use zpool detach

Link to comment
https://linustechtips.com/topic/1362539-help-zfs-suicide/#findComment-14923888
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×