Jump to content

rhradec

Member
  • Posts

    7
  • Joined

  • Last visited

Reputation Activity

  1. Like
    rhradec got a reaction from dogwitch in Our data is GONE... Again   
    DON'T REPLACE ALL THE FAILLING DRIVERS AT THE SAME TIME!!!!!!!!!!!!!!!!

    put the drives you guys took out back, and see if the zpools come up again!

    The errors reported by "zpool status" doesn't matter at this point... you only got that, because you replaced all fault drives at the same time, specially on the machine where the drives only had erros, but where still online on pool!!

    Put everything back the way it was, with the fault drives on they original place, and see if the pools come up.
    If they don't try to import then again, with "zpool import -a" and see.
    If they do come back up, reboot again just to make sure they come back up on they own again. (don't worry about errors for now) now you replace 1 drive at a time, and let the zpool resilver!!!

    Once one drive resilvered, you reboot, check if all pools are online, and replace another... and so on.
    NEVER replace all the drivers at the same time!!! Even with unavailable drives, replace one first and let it resilver... that way when you replace the second, it will have the data on the first new drive to work with, putting less strain on all the old drives!!   You have to put the original unavailable/faulting drives back on the pools anyway if you want to dump the data in another storage, or else you won't be able to access much of it.    I had this EXACT problem before with zfs when people replaced all faulting drives at the same time thinking that's was the "safest" thing to do!! By putting everything back on the way it was and making the pools online again by "zfs import -a" or even by import with the pool ID, I was able to fix the pool by replacing the drives one by one, let then resilver in between. No data was lost, despite the billions of errors zpool status spitted out!    I even fixed a standard linux RAID5 on a Qnap NAS with 3 disks faulting, by doing the same. I put all the 6 drives on a linux machine,  replace one by one, with resilver in between.    I take care of servers and storage of 2 VFX studios overseas, and I had my fair share of people getting anxious when storage starts to fail and deciding the best thing is to replace all "broken" drives at once instead of doing it in steps, carefully, waiting the software to do its thing on one drive at time.

    I understand It does seem like the safest is to take all the broken stuff out, but you have to go slowly.   By the way, I live in Vancouver, if you guys want some help.    
  2. Agree
    rhradec got a reaction from swimtome in Our data is GONE... Again   
    DON'T REPLACE ALL THE FAILLING DRIVERS AT THE SAME TIME!!!!!!!!!!!!!!!!

    put the drives you guys took out back, and see if the zpools come up again!

    The errors reported by "zpool status" doesn't matter at this point... you only got that, because you replaced all fault drives at the same time, specially on the machine where the drives only had erros, but where still online on pool!!

    Put everything back the way it was, with the fault drives on they original place, and see if the pools come up.
    If they don't try to import then again, with "zpool import -a" and see.
    If they do come back up, reboot again just to make sure they come back up on they own again. (don't worry about errors for now) now you replace 1 drive at a time, and let the zpool resilver!!!

    Once one drive resilvered, you reboot, check if all pools are online, and replace another... and so on.
    NEVER replace all the drivers at the same time!!! Even with unavailable drives, replace one first and let it resilver... that way when you replace the second, it will have the data on the first new drive to work with, putting less strain on all the old drives!!   You have to put the original unavailable/faulting drives back on the pools anyway if you want to dump the data in another storage, or else you won't be able to access much of it.    I had this EXACT problem before with zfs when people replaced all faulting drives at the same time thinking that's was the "safest" thing to do!! By putting everything back on the way it was and making the pools online again by "zfs import -a" or even by import with the pool ID, I was able to fix the pool by replacing the drives one by one, let then resilver in between. No data was lost, despite the billions of errors zpool status spitted out!    I even fixed a standard linux RAID5 on a Qnap NAS with 3 disks faulting, by doing the same. I put all the 6 drives on a linux machine,  replace one by one, with resilver in between.    I take care of servers and storage of 2 VFX studios overseas, and I had my fair share of people getting anxious when storage starts to fail and deciding the best thing is to replace all "broken" drives at once instead of doing it in steps, carefully, waiting the software to do its thing on one drive at time.

    I understand It does seem like the safest is to take all the broken stuff out, but you have to go slowly.   By the way, I live in Vancouver, if you guys want some help.    
×