Jump to content

VM's going into Read Only

I am really not sure where to turn here, so I hope some folks here can lend me a hand…

 

I have a homelab that has been stable for a few years now but I am now running into the strangest of issues. I run ESXi with ~4 Ubuntu LTSC 18.04’s and freenas with baremetal HDD access, along with a Win 10 LTSC for Veeam backup of the Ubuntu VM’s.

 

A few weeks ago, I would SSH into an Ubuntu VM and ZSH would prompt me of the file system being Read Only. I shrugged it off as a weird one time thing and restored from Veeam, but that it started happening more, and to more VM’s. Last night, sometime between when I went to sleep and woke up, all 4 VM’s had it happen…

 

Initially I thought maybe it was a Veeam backup durring an Ubuntu auto-security update (I do have that enabled), but Veeam did not run last night. I am restoring one of the smaller VM’s right now to an earlier state just to try and collect some info, but I am not even sure what to collect.

 

When I google the issue, it seems like the main answer is a mounted file system has errors, or FSTAB is the issue. I am not sure how this would be as one of the VM’s is not mounting any of my FreeNAS storage at all… It has no network links at all. I just got that VM restored (it just runs pihole… pretty simple VM, literally nothing to it at all except pihole), this is the fstab config:

UUID=2d945db3-1ff7-4c22-8022-0479207bb427 / ext4 defaults 0 0
/swap.img none swap sw 0 0

 

I run the VM’s (and ESXo) on a consumer SSD, and I upgraded to it maybe 5 months ago so its not exactly old, and its not like my VM’s hit it hard at all, its a very low use homelab, its on 24/7, but realistically its doing almost nothing 100% of the time. Could the VM’s be doing some sort of SMART check, seeing an issue with the SSD, and going to read-only as some sort of protection? How would I determine this? What else could be happening?

Rig: i7 13700k - - Asus Z790-P Wifi - - RTX 4080 - - 4x16GB 6000MHz - - Samsung 990 Pro 2TB NVMe Boot + Main Programs - - Assorted SATA SSD's for Photo Work - - Corsair RM850x - - Sound BlasterX EA-5 - - Corsair XC8 JTC Edition - - Corsair GPU Full Cover GPU Block - - XT45 X-Flow 420 + UT60 280 rads - - EK XRES RGB PWM - - Fractal Define S2 - - Acer Predator X34 -- Logitech G502 - - Logitech G710+ - - Logitech Z5500 - - LTT Deskpad

 

Headphones/amp/dac: Schiit Lyr 3 - - Fostex TR-X00 - - Sennheiser HD 6xx

 

Homelab/ Media Server: Proxmox VE host - - 512 NVMe Samsung 980 RAID Z1 for VM's/Proxmox boot - - Xeon e5 2660 V4- - Supermicro X10SRF-i - - 128 GB ECC 2133 - - 10x4 TB WD Red RAID Z2 - - Corsair 750D - - Corsair RM650i - - Dell H310 6Gbps SAS HBA - - Intel RES2SC240 SAS Expander - - TreuNAS + many other VM’s

 

iPhone 14 Pro - 2018 MacBook Air

Link to comment
Share on other sites

Link to post
Share on other sites

Trying some things out, I tried this. So, looks like Ubuntu put itself in read-only as it found some corruption? So is this indicative of a failing boot SSD?

 

image.thumb.png.a3b6a218c3e4581b821a0f5cbc4d8937.png

Rig: i7 13700k - - Asus Z790-P Wifi - - RTX 4080 - - 4x16GB 6000MHz - - Samsung 990 Pro 2TB NVMe Boot + Main Programs - - Assorted SATA SSD's for Photo Work - - Corsair RM850x - - Sound BlasterX EA-5 - - Corsair XC8 JTC Edition - - Corsair GPU Full Cover GPU Block - - XT45 X-Flow 420 + UT60 280 rads - - EK XRES RGB PWM - - Fractal Define S2 - - Acer Predator X34 -- Logitech G502 - - Logitech G710+ - - Logitech Z5500 - - LTT Deskpad

 

Headphones/amp/dac: Schiit Lyr 3 - - Fostex TR-X00 - - Sennheiser HD 6xx

 

Homelab/ Media Server: Proxmox VE host - - 512 NVMe Samsung 980 RAID Z1 for VM's/Proxmox boot - - Xeon e5 2660 V4- - Supermicro X10SRF-i - - 128 GB ECC 2133 - - 10x4 TB WD Red RAID Z2 - - Corsair 750D - - Corsair RM650i - - Dell H310 6Gbps SAS HBA - - Intel RES2SC240 SAS Expander - - TreuNAS + many other VM’s

 

iPhone 14 Pro - 2018 MacBook Air

Link to comment
Share on other sites

Link to post
Share on other sites

With this information, and the fact the VM seems to be “ok” now that it rebooted, I am inclined to think these are the possible issues I am facing. Bad/loose SATA cable, dying SSD, bad RAM.

 

I am inclined to rule out RAM as FreeNAS and ZFS would likely be throwing all sorts of errors if it was seeing checksum issues… FreeNAS has bare metal access to my HBA, and I run ECC RAM. I would like to think somewhere along this chain FreeNAS would have been the first thing to throw issues at me if it was in fact a RAM issue. This leads me to believe its a bad SSD/SATA sable to the SSD.

Some info on the homelab if it will help:


Homelab/ Media Server: ESXi 6.5 - - 250 GB SSD for VM’s/ESXi boot - - FreeNAS 11.2-U5 - -HPE Proliant ML10 Gen 9 backbone - - i3 6100 - - 28 GB ECC - - 10x4 TB WD Red RAID Z2

Rig: i7 13700k - - Asus Z790-P Wifi - - RTX 4080 - - 4x16GB 6000MHz - - Samsung 990 Pro 2TB NVMe Boot + Main Programs - - Assorted SATA SSD's for Photo Work - - Corsair RM850x - - Sound BlasterX EA-5 - - Corsair XC8 JTC Edition - - Corsair GPU Full Cover GPU Block - - XT45 X-Flow 420 + UT60 280 rads - - EK XRES RGB PWM - - Fractal Define S2 - - Acer Predator X34 -- Logitech G502 - - Logitech G710+ - - Logitech Z5500 - - LTT Deskpad

 

Headphones/amp/dac: Schiit Lyr 3 - - Fostex TR-X00 - - Sennheiser HD 6xx

 

Homelab/ Media Server: Proxmox VE host - - 512 NVMe Samsung 980 RAID Z1 for VM's/Proxmox boot - - Xeon e5 2660 V4- - Supermicro X10SRF-i - - 128 GB ECC 2133 - - 10x4 TB WD Red RAID Z2 - - Corsair 750D - - Corsair RM650i - - Dell H310 6Gbps SAS HBA - - Intel RES2SC240 SAS Expander - - TreuNAS + many other VM’s

 

iPhone 14 Pro - 2018 MacBook Air

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×