Jump to content

Hi,

 

I seem to have the shittiest luck recently with hardware and I might be slightly in over my head on this one. Maybe someone here can help me. This might get a bit long winded so I'll put a TL;DR at the end.

 

So first a quick recap, I have 2 3TB in a RAID1 array and one of the drives failed recently, so naturally I immediately ordered 2 replacement drives since both of the original drives were bought simultaneously and if one failed due to age then the other one might be just around the corner. The new drives arrived and the plan was to plug one in, let Intel RST rebuild the array with the new drive, then shutdown and replace the second drive. Sounds simple right? BUT when I went to turn my pc off so I could replace the first one I noticed that some of my desktop shortcuts were blank white icons and only those for software installed on the array, a quick check in explorer showed that the entire volume had vanished and Intel RST confirmed that the second drive had also failed but not entirely, it just had a failed state warning that I could click ignore on and that made the volume available again. 

 

After quickly checking that the volume was accessible and had all of my files I turned the PC off and proceeded to replace the first faulty drive (I had already confirmed which was the first to fail so no risk of accidentaly replacing the wrong one).

 

Intel RST rebuild the array fine and everything seemed ok but when I went to check on the files on the volume I had a few folders say that the path was inaccessible so there clearly was some corruption, checked Intel RST and it was running the verification process so I just let it be for a few hours.

 

Then comes the current troubles.

When I came back to the PC I just barely saw the verification % at 5% and then it suddently rebooted and went to do a CHKDSK on the volume, first I thought that was fine and was planning on letting it finish the check but then I did some googling on my phone about CHKDSK and RAID, found out that CHKDSK could seriously damage the array if the array was already compromised so I quickly did a hard reset and skipped the CHKDSK promt on reboot, everything booted fine and Intel RST returned to the verification process.

 

I then did a dirty query with fsutil and found that my RAID volume had the dirty bit set and apparently the only way to clear the bit is with CHKDSK which I don't want to run, at least not until the volume is verified to be healthy but since the dirty bit is set that then means that CHKDSK will try to scan and repair the drive on every reboot, right?

 

My current plan is to leave the PC on and let it finish the verification process, which btw takes ages, then once it completes, turn the PC off and reboot with only one drive, run the CHKDSK to clear the dirty bit, put the second new drive in and then let Intel RST rebuild the volume again.

 

Thoughts and/or suggestions regarding that plan? I found that it is possible to manually clear the dirty bit with a hex editor but information on this is inconsistent at best, mentions that different OSes have different locations for the bit with mentions of the location for Win7 and 8 but nothing about Win10 and I'm not entirely confident in my skills with a hex editor to be messing around with entire logical volumes. I currently have 6h left of my shift at work before getting home to the PC.

 

Any input on the matter would be welcome, even if it's just to call me an idiot because I sure feel like one now... I should just have turned the PC off when the first drive failed and left it untouched until the new drives arrived. Oh and if it wasn't obvious already, this is my first and only RAID array, no prior experience with them, kind of outside my comfort zone with this one.

 

System specs:

Asus Z170M-Plus

Intel i5 6600K

16gb DDR4

GTX1070ti

500gb SSD

2x 3TB in RAID1 using Intel RST

Win 10 Pro x64

 

TL;DR

RAID1 reports one failed drive,

2 new drives ordered, before installing new drives the second drive in the array also fails but seems saveable. After possibly saving the volume windows sets the dirty bit forcing CHKDSK check which might further damage the already compromised array. Plan is to get the array up and running with only one of the new drives then running CHKDSK to clear the bit after which proceeding with rebuilding array with second new drive.

 

 

Link to comment
https://linustechtips.com/topic/1315409-raid-array-troubles/
Share on other sites

Link to post
Share on other sites

Forget about the dirty bit, just avoid rebooting and skip chkdsk if it reboots anyway.

 

But TBH chances are low right now, best is to just restore from backup which I hope you do have since "RAID is not a backup"

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
https://linustechtips.com/topic/1315409-raid-array-troubles/#findComment-14561634
Share on other sites

Link to post
Share on other sites

15 minutes ago, Kilrah said:

Forget about the dirty bit, just avoid rebooting and skip chkdsk if it reboots anyway.

 

But TBH chances are low right now, best is to just restore from backup which I hope you do have since "RAID is not a backup"

I have a backup of the important stuff but there's alot of content on the volume that is in the "not vital but would be nice to have" category. I guess I'll see where the verification process is at once I get home from work, if it's complete and the volume works fine then I'll just go ahead and replace the second drive with my fingers crossed and go from there.

 

The dirty bit currently is not that big of an issue, as you said I can just skip it but in the case that I manage to salvage the volume to a healthy state, I'd like it cleared just so I don't need to worry about skipping CHKDSK on every reboot but I'm still worried about clearing the bit using CHKDSK even with a 100% intact volume. One option would be to backup everything on the drive to an external 3TB drive and reformat the array, this I assume would clear the bit.

Link to comment
https://linustechtips.com/topic/1315409-raid-array-troubles/#findComment-14561655
Share on other sites

Link to post
Share on other sites

1 hour ago, Notn4 said:

I'm still worried about clearing the bit using CHKDSK even with a 100% intact volume

Why? 

CHKDSK is no problem if the underlying volume is fine. The issue is that if the volume below it is corrupted it might see "filesystem errors" that are actually not filesystem errors but errors at the volume level below it, and if it then "corrects" those errors at the filesystem level it'll end up corrupting it since that was actually good. 

 

1 hour ago, Notn4 said:

if it's complete and the volume works fine then I'll just go ahead and replace the second drive with my fingers crossed and go from there.

If it completes and the volume works I'd copy the data to another drive without even replacing the 2nd drive first, then destroy/recreate the array completely. 

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
https://linustechtips.com/topic/1315409-raid-array-troubles/#findComment-14561725
Share on other sites

Link to post
Share on other sites

7 hours ago, Kilrah said:

If it completes and the volume works I'd copy the data to another drive without even replacing the 2nd drive first, then destroy/recreate the array completely. 

This is what I ended up doing, or what I'm currently doing. The array seems fine for now, I can access all my files and they are currently being transferred over to an external drive. After I have everything backed up I'll just nuke the entire array and start fresh with the two new drives.

 

New drives + new array and then just restoring the backup from the external drive (just a snapshot, not a clone image). Maybe then I'll finally have some peace of mind, been pulling my hair out with this one. Still can't believe I got 2 drive failures almost simultaneously... Or is this a common thing with RAID1? The drives see identical activity, right? But are they manufactured that identically that if one fails after X amount of usage then its brother will die almost instantly after, if it has had an identical life?

Link to comment
https://linustechtips.com/topic/1315409-raid-array-troubles/#findComment-14562618
Share on other sites

Link to post
Share on other sites

Correct, very common, when you run RAID you should try to get drives from different suppliers hoping to get different batches and that they don't have the same behavior.

 

And it's usually not worth running RAID unless you need the availability i.e. avoid downtime when a drive fails, better to use the 2nd drive as independent backup that's updated daily or so.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
https://linustechtips.com/topic/1315409-raid-array-troubles/#findComment-14562662
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×