Jump to content

My NAS just stopped working suddenly

airborne spoon

I have a Plex server running on turenas 13.1 it was working fine then yesterday it was boot looping so I looked at a monitor and it shows the first pic for the error.

So I removed ram till I had 1 stick and tried all 4 sticks in all 4 slots solo and that made the second pic every time it rebooted.

 

So I disconnected and reconnected all the SATA/SAS and power cables and no change. So I saw the bios was like 18+ months old so I updated it and now it won't even boot to the truenas loader it reboots before it even gets to that screen with the last error.

 

How did it just stop working suddenly and how can I fix it?

 

Screenshot_20240227-184404.thumb.png.b801050a57d130216c5909106510a5af.pngScreenshot_20240228-102106.thumb.png.ba8676868d85560066e65611323d9988.pngScreenshot_20240228-104248.thumb.png.5bd81c7d157ef709efb735f91bf09313.png

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, OhYou_ said:

unplug all drives except the os boot drive and see if it boots.
quick google is showing a couple of exactly the same issues  https://github.com/openzfs/zfs/issues/13483
 

Interesting I'm out and about now but just for lol's I'll grab a new SSD to use for a boot drive in case that is the problem since it's like 30 min to town

Link to comment
Share on other sites

Link to post
Share on other sites

to me it looks like your raid is good as it's the first it syncs.. but your boot disk might have had an error and when rebuilding it corrupted your boot kernel or something like that.  that's how ssd's self repair... 

 

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, OhYou_ said:

unplug all drives except the os boot drive and see if it boots.
quick google is showing a couple of exactly the same issues  https://github.com/openzfs/zfs/issues/13483
 

 

1 hour ago, Robchil said:

to me it looks like your raid is good as it's the first it syncs.. but your boot disk might have had an error and when rebuilding it corrupted your boot kernel or something like that.  that's how ssd's self repair... 

 

I just did a fresh install on a brand new 500gb SSD and it doesn't work. I then unplugged the power to all the drives except the boot drive and did a fresh install and it doesn't boot. It goes to this screen and stupid quick reboots. If I go into the bios and hit boot override (it's the only drive anyway) it goes to this and then back to the bios in like 2 seconds.

 

Screenshot_20240228-170807.thumb.png.d4654f50890d6f2563a9e82d811393e1.png

Link to comment
Share on other sites

Link to post
Share on other sites

Well nothing works… however, I put the original boot drive (M.2) in my gaming rig and it booted just like it's supposed to, to truenas.

 

Sooooo I'm gonna Frankenstein it around to find what is broken. My gaming rig is hard line water so I'm gonna start with the PSU and hopefully that's what the prob is 🤷‍♂️😂

Link to comment
Share on other sites

Link to post
Share on other sites

47 minutes ago, airborne spoon said:

Well nothing works… however, I put the original boot drive (M.2) in my gaming rig and it booted just like it's supposed to, to truenas.

 

Sooooo I'm gonna Frankenstein it around to find what is broken. My gaming rig is hard line water so I'm gonna start with the PSU and hopefully that's what the prob is 🤷‍♂️😂

when i worked with vmware.. i usually used small USB sticks size of a logitech BT sticks that comes with mouse and keyboards. with 16GB on it.. was easy to boot and or change, since it's unimportant what disk the OS is on 😄 when it's in memory it will run until it crashes 😄

 

Link to comment
Share on other sites

Link to post
Share on other sites

19 minutes ago, Robchil said:

when i worked with vmware.. i usually used small USB sticks size of a logitech BT sticks that comes with mouse and keyboards. with 16GB on it.. was easy to boot and or change, since it's unimportant what disk the OS is on 😄 when it's in memory it will run until it crashes 😄

 

Yeah I tried a different drive for the boot and it's not working.

 

I found another CPU in my parts stash I forgot about and it didn't change anything.

 

So that means it's prob the MB then.

 

I'll test for sure when I get some paste to swap the CPU's because I don't wanna pull apart my main rig without being able to put it right back together afterwards. But since I've swapped out the PSU and a likely good CPU the only thing left would be the MB since the boot drive worked fine in my gaming rig.

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, airborne spoon said:

Yeah I tried a different drive for the boot and it's not working.

 

I found another CPU in my parts stash I forgot about and it didn't change anything.

 

So that means it's prob the MB then.

 

I'll test for sure when I get some paste to swap the CPU's because I don't wanna pull apart my main rig without being able to put it right back together afterwards. But since I've swapped out the PSU and a likely good CPU the only thing left would be the MB since the boot drive worked fine in my gaming rig.

yeah sounds like the last thing to test..  but i would try to disable things in bios before ripping it out.. i see it loads config trusted and fails then crashes.. 

has there been any updates that could cause that? 

and check that time is set correctly.. 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

45 minutes ago, Robchil said:

yeah sounds like the last thing to test..  but i would try to disable things in bios before ripping it out.. i see it loads config trusted and fails then crashes.. 

has there been any updates that could cause that? 

and check that time is set correctly.. 

 

 

I tried doing a bios update and no dice. There shouldn't be any reason to change anything in the bios since it just said fuck it and stopped working suddenly not like I was messing with it. But I have gone in there and messed with it anyway just to try and rule out random settings

Link to comment
Share on other sites

Link to post
Share on other sites

Ok threw the CPU in my gaming rig with the boot drive and it fired right up like it's supposed to.

 

So it's the MB that crapped out on me and it's 1 year past the warranty experation.

 

Guess I'll be looking for a new MB, I've always had good luck with MSI this is my first MB failure of my life. But I need lots of PCI lanes so gotta find a good one.

 

SAS card PCI x8

10GB NIC PCI x8

1060 6gb PCI x16

Although I can probably get away with a 1x to 16x riser card for any one of those.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, airborne spoon said:

Ok threw the CPU in my gaming rig with the boot drive and it fired right up like it's supposed to.

 

So it's the MB that crapped out on me and it's 1 year past the warranty experation.

 

Guess I'll be looking for a new MB, I've always had good luck with MSI this is my first MB failure of my life. But I need lots of PCI lanes so gotta find a good one.

 

SAS card PCI x8

10GB NIC PCI x8

1060 6gb PCI x16

Although I can probably get away with a 1x to 16x riser card for any one of those.

well.. if speed isn't essential for you.. a threadripper 1920X should have enough PCI lanes... around 60.

 

mainboard  cpu should be around 1200$...  from amazon.. but just as example.. if you get x399 boards cheaper i would get that. i only found 1 with price on it. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Robchil said:

well.. if speed isn't essential for you.. a threadripper 1920X should have enough PCI lanes... around 60.

 

mainboard  cpu should be around 1200$...  from amazon.. but just as example.. if you get x399 boards cheaper i would get that. i only found 1 with price on it. 

So I plugged my SAS/SATA cables into it one at a time into both ports all 4 times it fired up correctly. So I'm 99% sure my SAS to SATA HBA card has taken a huge shit on me and caused all my problems for the last few days.

I don't have a spare card so I'll have to buy a new to me one but it's the only thing that makes sense

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, airborne spoon said:

So I plugged my SAS/SATA cables into it one at a time into both ports all 4 times it fired up correctly. So I'm 99% sure my SAS to SATA HBA card has taken a huge shit on me and caused all my problems for the last few days.

I don't have a spare card so I'll have to buy a new to me one but it's the only thing that makes sense

I find that weird, since it finds 10 disks and syncs all at boot before panicking. 

are you seeing this in bios? 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, airborne spoon said:

So I plugged my SAS/SATA cables into it one at a time into both ports all 4 times it fired up correctly. So I'm 99% sure my SAS to SATA HBA card has taken a huge shit on me and caused all my problems for the last few days.

I don't have a spare card so I'll have to buy a new to me one but it's the only thing that makes sense

Avoid buying msi products

Link to comment
Share on other sites

Link to post
Share on other sites

well i got a new HBA card in and its had no change, still won't boot with all the drives connected. i can connect up to 5 drives on the HBA card usimg 2x 4 to 1 cables it doesnt matter what drives i connect or which cables i use as soon as i try to connect a 6th 7th or 8th drive to the SAS card it wont boot.

 

Attached are the shots of it trying to boot and then it reboots after the last one

 

I've tried a different MB, different CPU, different PSU, different SAS HBA card, different cables, also tried swapping the HBA card to a different PCI slot with no change either. I honestly can't figure out WTF is wrong with this thing

 

Screenshot_20240304-164454.thumb.png.59548df1c36f4089e070b1f2cc7e6039.pngScreenshot_20240304-1645132.thumb.png.05d727919160d9d05839339130e41116.pngScreenshot_20240304-164534.thumb.png.df2143c8cbaeabf31a3465d90772b283.png

Link to comment
Share on other sites

Link to post
Share on other sites

Update again, if I boot with 5 drives connected then connect the other 3 I can go into the shell on the truenas page and type zpool import -o read-only=on tank

It then shows my pool and everything as online but I can't access it from the network drive on Windows to see the files.

Link to comment
Share on other sites

Link to post
Share on other sites

Update I got it mounted in Ubuntu live and now I have a 15tb drive coming in tomorrow I'll transfer all the files and then destroy the pool and start over from scratch and see what happens.

Link to comment
Share on other sites

Link to post
Share on other sites

Update the whatever this is now.

 

I was able to copy the files in Ubuntu using resync and got all but 28 files which considering how there was 11TB or about 180,000 files I'd say that's definitely acceptable and I also have an itemized list of the files that didn't copy so I can sail the high seas and find them again.

 

I'm currently running dd on all 8 of the drives… it progressed about 500GB every 30 min sooooo in like 10 hours or so the drives will be back to good as new from the factory and I can make a fresh new pool and then transfer all the files back to the server and hopefully everything will just work

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×