Jump to content

It's crisis-time again!

Arokan

God, I must be the 1000th person posting their panic in this forum, fearing they might've lost their data šŸ˜„

Ā 

I'll try to make it quick:
I use this 4-bay-enclosureĀ with 4x3TB HDDs in RAID5 with mdadm on a raspi4-debian-bullseye.

Ā 

Behaviour: I was moving some files here and there and suddenly, the transfer is stuck. ls returns input/output error.
I thought a quick reboot might do as in "have you tried turning it off and on again?" which usually does the trick.

I succeeded in hanging it in briefly, ls returned the directory content correctly with "input/output error" on half of the list.

The next ls immediately after returns nothing, the mounpoint is empty, yet trying to remount it returns:
"mount: /mnt/nas: /dev/md0 already mounted or mount point busy."

Ā 

Outs of lsblk,blkid and mdadm -D are uploaded.

Ā 

I have tried googling the issue, but I had no luck.

I really don't know what's going on here and I could really use some help.

Thank you all in advance!

blkid.txt lsblk.txt mdadm-D.txt

Link to comment
Share on other sites

Link to post
Share on other sites

Work from the ground up.Ā 
Ā 

  1. Stop all services using the array
  2. Unmount the array
  3. Stop the array
  4. use smartctl to run a conveyance test on each drive, verifying the drives individually are still healthy
  5. If at most one drive is unhealthy, start the array
  6. verify the array itself is healthy with mdadm
  7. If it is healthy, verify the filesystem on top of it is healthy ( which file system are you using)

Ā 

ā€turning it off and on againā€ is only useful if you donā€™t care about diagnosing the problem. With storage you ALWAYS care about diagnosing the problemĀ 

Ā 

mdadm is a kernel system. itā€™s possible thereā€™s pertinent information in your kernel messages (dmesg) which get cleared on boot. I hope you have your system set up to capture and persist those on disk. Look under /var/log/kernel

Ā 

also have a look at /var/log/messages or similar (journalctl?) for clues on your error

Ā 

chances are the raid is fine but the filesystem is borked due to a comms error with the drives during writes. You usually can recover from this with minimal data loss if you work methodically

Ā 

when it comes to data management, ALWAYS think and understand before taking intrusive actions like re-builds or reboots.Ā 

Ā 

make backups. For the love of god make backupsĀ 

Link to comment
Share on other sites

Link to post
Share on other sites

Thank you for your reply!
This looks like at least one drive (sdh) is indeed failing, right? I don't quite know how to interpret this.Ā 

Ā 

I also found:Ā 

sudo mdadm -E /dev/md0
mdadm: No md superblock detected on /dev/md0.

Don't know if that helps.

sdf.txt sdg.txt sdh.txt sdi.txt

Link to comment
Share on other sites

Link to post
Share on other sites

So first off: `mdadm -E` should be done on individual drives, not the logical array device. See the manual:

       -E, --examine
              Print  contents  of  the metadata stored on the named device(s).  Note the contrast between --examine and --detail.
              --examine applies to devices which are components of an array, while --detail applies to a  whole  array  which  is
              currently active.

Ā 

So, you're kinde screwed. Both sdg and sdh are reporting errors. sdh has the most recent errors, so that's probably from when you experienced the problems the first time. However, sdg reported error from 4000 hours ago. That's roughly half a year ago if they're running 24/7!

Ā 

Did you have any monitoring on this whatsoever?

Ā 

I'm going to go ahead and say you're quite probably in big trouble. Chances are minimal (almost zero) you'll be able to recover from this, if there's actually two corrupt drives here.

Ā 

To make sure that's the case, please perform `sudo mdadm -E <drive>` for each of your 4 individual drives. Aslo, please do the conveyance tests on each drive. Your smart data shows no tests have run for sd{g,h,i}, only sdf has had tests run previously.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Ɨ