Jump to content

PVE stops responding, hardware is stable, nothing gets logged. IO related issues?

Levent
Go to solution Solved by leadeater,
3 hours ago, Levent said:

moved the adapter to the top pcie1x slot. I ran my cache worker non-stop overnight, I also did couple of terribly long disk benchmarks while scrubbing HDDs. It didnt crash at all.

Top one 100% is off the CPU btw and the bottom one is off the chipset. Sounds like it wasn't happy running off the chipset. Personally quite hopeful that it has fixed the issue.

I am having issues with my home server where, I made sure RAM and CPU are as stable as they get (even tested them for 72 hours non stop) however i am still getting hardlocks and "CPU stuck" errors if I was logged to server via SSH.

 

Nothing ever show up on journalctl, there is nothing abnormal logged in VMs. Everything simply stops. I cant ssh into PVE or any of the VMs afterwards. Only common thing is, around these hardlocks happen, server is scheduled do IO heavy tasks.

 

I previously posted about this briefly as a status update.

 

I changed my disk options in all VMs to follow virtio-scsi-single, all vm disk have thread=1 enabled.

image.png.1288c60b49dd121175241b81220acff8.png

 

Any thoughts? Only hardware change happened to the server in the last month is that I added another NVME using M2 to PCIE1x adapter.

mY sYsTeM iS Not pErfoRmInG aS gOOd As I sAW oN yOuTuBe. WhA t IS a GoOd FaN CuRVe??!!? wHat aRe tEh GoOd OvERclok SeTTinGS FoR My CaRd??  HoW CaN I foRcE my GpU to uSe 1o0%? BuT WiLL i HaVE Bo0tllEnEcKs? RyZEN dOeS NoT peRfORm BetTer wItH HiGhER sPEED RaM!!dId i WiN teH SiLiCON LotTerrYyOu ShoUlD dEsHrOuD uR GPUmy SYstEm iS UNDerPerforMiNg iN WarzONEcan mY Pc Run WiNdOwS 11 ?woUld BaKInG MY GRaPHics card fIX it? MultimETeR TeSTiNG!! aMd'S GpU DrIvErS aRe as goOD aS NviDia's YOU SHoUlD oVERCloCk yOUR ramS To 5000C18

 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, leadeater said:

Is all your NVMe SSDs direct to CPU or through chipset?

I have one plugged to the x4 slot that is wired to the CPU. Other, I have no clue but considering its rated for PCIE 2 X1, I would assume its through the chipset. I actually havent thought of that before.

mY sYsTeM iS Not pErfoRmInG aS gOOd As I sAW oN yOuTuBe. WhA t IS a GoOd FaN CuRVe??!!? wHat aRe tEh GoOd OvERclok SeTTinGS FoR My CaRd??  HoW CaN I foRcE my GpU to uSe 1o0%? BuT WiLL i HaVE Bo0tllEnEcKs? RyZEN dOeS NoT peRfORm BetTer wItH HiGhER sPEED RaM!!dId i WiN teH SiLiCON LotTerrYyOu ShoUlD dEsHrOuD uR GPUmy SYstEm iS UNDerPerforMiNg iN WarzONEcan mY Pc Run WiNdOwS 11 ?woUld BaKInG MY GRaPHics card fIX it? MultimETeR TeSTiNG!! aMd'S GpU DrIvErS aRe as goOD aS NviDia's YOU SHoUlD oVERCloCk yOUR ramS To 5000C18

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Levent said:

I have one plugged to the x4 slot that is wired to the CPU. Other, I have no clue but considering its rated for PCIE 2 X1, I would assume its through the chipset. I actually havent thought of that before.

CPU and chipset?

 

Either one could be bugging out for some reason and you might have to switch to CPU only or chipset only. Also try setting PCIe version down to like 2.0 in the BIOS. Personally I think you are getting errors on the PCIe bus and it's locking up.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, leadeater said:

CPU and chipset?

 

Either one could be bugging out for some reason and you might have to switch to CPU only or chipset only. Also try setting PCIe version down to like 2.0 in the BIOS. Personally I think you are getting errors on the PCIe bus and it's locking up.

B450M S2H V2, R5 3600, Samsung PM981A (plugged to M2 socket), SKHynix HFM512GD3HX015N (connected via PCIE to M2 adapter) I also got a GTX970 in there as well. It does makes sense, ill remove the graphics card and plug the PCIE1x card there and test again.

 

I should also mention that I got 4x HDDs in this system and I assume these crashes happen when all HDDs and both NVMEs are running full tilt.

 

Thanks!

mY sYsTeM iS Not pErfoRmInG aS gOOd As I sAW oN yOuTuBe. WhA t IS a GoOd FaN CuRVe??!!? wHat aRe tEh GoOd OvERclok SeTTinGS FoR My CaRd??  HoW CaN I foRcE my GpU to uSe 1o0%? BuT WiLL i HaVE Bo0tllEnEcKs? RyZEN dOeS NoT peRfORm BetTer wItH HiGhER sPEED RaM!!dId i WiN teH SiLiCON LotTerrYyOu ShoUlD dEsHrOuD uR GPUmy SYstEm iS UNDerPerforMiNg iN WarzONEcan mY Pc Run WiNdOwS 11 ?woUld BaKInG MY GRaPHics card fIX it? MultimETeR TeSTiNG!! aMd'S GpU DrIvErS aRe as goOD aS NviDia's YOU SHoUlD oVERCloCk yOUR ramS To 5000C18

 

Link to comment
Share on other sites

Link to post
Share on other sites

38 minutes ago, leadeater said:

CPU and chipset?

 

Either one could be bugging out for some reason and you might have to switch to CPU only or chipset only. Also try setting PCIe version down to like 2.0 in the BIOS. Personally I think you are getting errors on the PCIe bus and it's locking up.

I like this idea. @Levent see if this has any impact. 

What PCIe to NVMe adapter card are you using? 

Rig: i7 13700k - - Asus Z790-P Wifi - - RTX 4080 - - 4x16GB 6000MHz - - Samsung 990 Pro 2TB NVMe Boot + Main Programs - - Assorted SATA SSD's for Photo Work - - Corsair RM850x - - Sound BlasterX EA-5 - - Corsair XC8 JTC Edition - - Corsair GPU Full Cover GPU Block - - XT45 X-Flow 420 + UT60 280 rads - - EK XRES RGB PWM - - Fractal Define S2 - - Acer Predator X34 -- Logitech G502 - - Logitech G710+ - - Logitech Z5500 - - LTT Deskpad

 

Headphones/amp/dac: Schiit Lyr 3 - - Fostex TR-X00 - - Sennheiser HD 6xx

 

Homelab/ Media Server: Proxmox VE host - - 512 NVMe Samsung 980 RAID Z1 for VM's/Proxmox boot - - Xeon e5 2660 V4- - Supermicro X10SRF-i - - 128 GB ECC 2133 - - 10x4 TB WD Red RAID Z2 - - Corsair 750D - - Corsair RM650i - - Dell H310 6Gbps SAS HBA - - Intel RES2SC240 SAS Expander - - TreuNAS + many other VM’s

 

iPhone 14 Pro - 2018 MacBook Air

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, LIGISTX said:

I like this idea. @Levent see if this has any impact. 

What PCIe to NVMe adapter card are you using? 

I honestly didnt even see any text on the thing.

(it looks like this)

Spoiler

NVMe PCIe M.2 NGFF SSD PCIe x1 Adaptör Kartı PCIe x1 ila M.2 Kartı Braketi  PCI-E M.2 Adaptörü 2230 2240 2260 2280 SSD M2

 

mY sYsTeM iS Not pErfoRmInG aS gOOd As I sAW oN yOuTuBe. WhA t IS a GoOd FaN CuRVe??!!? wHat aRe tEh GoOd OvERclok SeTTinGS FoR My CaRd??  HoW CaN I foRcE my GpU to uSe 1o0%? BuT WiLL i HaVE Bo0tllEnEcKs? RyZEN dOeS NoT peRfORm BetTer wItH HiGhER sPEED RaM!!dId i WiN teH SiLiCON LotTerrYyOu ShoUlD dEsHrOuD uR GPUmy SYstEm iS UNDerPerforMiNg iN WarzONEcan mY Pc Run WiNdOwS 11 ?woUld BaKInG MY GRaPHics card fIX it? MultimETeR TeSTiNG!! aMd'S GpU DrIvErS aRe as goOD aS NviDia's YOU SHoUlD oVERCloCk yOUR ramS To 5000C18

 

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, Levent said:

I honestly didnt even see any text on the thing.

(it looks like this)

  Hide contents

NVMe PCIe M.2 NGFF SSD PCIe x1 Adaptör Kartı PCIe x1 ila M.2 Kartı Braketi  PCI-E M.2 Adaptörü 2230 2240 2260 2280 SSD M2

 

It’s probably™, but this is why I use Supermicro PCIe to nvme cards in my server… less chance of issues with server gear. But I also run an older Supermicro mobo, so it was sort of required.

 

Id try moving the drive to the top slot as suggested, if that doesn’t work, maybe remove it all together for a bit and see if anything changes? I assume it holds some of your VM’s on it tho? If so, do you have a SATA SSD laying around you could use in the meantime for testing? 

Rig: i7 13700k - - Asus Z790-P Wifi - - RTX 4080 - - 4x16GB 6000MHz - - Samsung 990 Pro 2TB NVMe Boot + Main Programs - - Assorted SATA SSD's for Photo Work - - Corsair RM850x - - Sound BlasterX EA-5 - - Corsair XC8 JTC Edition - - Corsair GPU Full Cover GPU Block - - XT45 X-Flow 420 + UT60 280 rads - - EK XRES RGB PWM - - Fractal Define S2 - - Acer Predator X34 -- Logitech G502 - - Logitech G710+ - - Logitech Z5500 - - LTT Deskpad

 

Headphones/amp/dac: Schiit Lyr 3 - - Fostex TR-X00 - - Sennheiser HD 6xx

 

Homelab/ Media Server: Proxmox VE host - - 512 NVMe Samsung 980 RAID Z1 for VM's/Proxmox boot - - Xeon e5 2660 V4- - Supermicro X10SRF-i - - 128 GB ECC 2133 - - 10x4 TB WD Red RAID Z2 - - Corsair 750D - - Corsair RM650i - - Dell H310 6Gbps SAS HBA - - Intel RES2SC240 SAS Expander - - TreuNAS + many other VM’s

 

iPhone 14 Pro - 2018 MacBook Air

Link to comment
Share on other sites

Link to post
Share on other sites

26 minutes ago, LIGISTX said:

It’s probably™, but this is why I use Supermicro PCIe to nvme cards in my server… less chance of issues with server gear. But I also run an older Supermicro mobo, so it was sort of required.

 

Id try moving the drive to the top slot as suggested, if that doesn’t work, maybe remove it all together for a bit and see if anything changes? I assume it holds some of your VM’s on it tho? If so, do you have a SATA SSD laying around you could use in the meantime for testing? 

I do have a SATA SSD but I got no SATA ports free. My weekly pipelines are almost finished, once they are finished ill remove the GPU, move the pcie m2 to x16 slot and trigger disk scrub in truenas vm and my database caching worker along with couple of benchmarks on the hypervisor itself. Ill report back.

mY sYsTeM iS Not pErfoRmInG aS gOOd As I sAW oN yOuTuBe. WhA t IS a GoOd FaN CuRVe??!!? wHat aRe tEh GoOd OvERclok SeTTinGS FoR My CaRd??  HoW CaN I foRcE my GpU to uSe 1o0%? BuT WiLL i HaVE Bo0tllEnEcKs? RyZEN dOeS NoT peRfORm BetTer wItH HiGhER sPEED RaM!!dId i WiN teH SiLiCON LotTerrYyOu ShoUlD dEsHrOuD uR GPUmy SYstEm iS UNDerPerforMiNg iN WarzONEcan mY Pc Run WiNdOwS 11 ?woUld BaKInG MY GRaPHics card fIX it? MultimETeR TeSTiNG!! aMd'S GpU DrIvErS aRe as goOD aS NviDia's YOU SHoUlD oVERCloCk yOUR ramS To 5000C18

 

Link to comment
Share on other sites

Link to post
Share on other sites

Quick update, I moved the PCIE NVME adapter to top slot. It wouldnt work in X16 no matter what I did. lspci would show the controller but not the disk? I plugged the GPU back in and moved the adapter to the top pcie1x slot. I ran my cache worker non-stop overnight, I also did couple of terribly long disk benchmarks while scrubbing HDDs. It didnt crash at all.

 

I dont want to say anything just yet and jinx myself.👀

mY sYsTeM iS Not pErfoRmInG aS gOOd As I sAW oN yOuTuBe. WhA t IS a GoOd FaN CuRVe??!!? wHat aRe tEh GoOd OvERclok SeTTinGS FoR My CaRd??  HoW CaN I foRcE my GpU to uSe 1o0%? BuT WiLL i HaVE Bo0tllEnEcKs? RyZEN dOeS NoT peRfORm BetTer wItH HiGhER sPEED RaM!!dId i WiN teH SiLiCON LotTerrYyOu ShoUlD dEsHrOuD uR GPUmy SYstEm iS UNDerPerforMiNg iN WarzONEcan mY Pc Run WiNdOwS 11 ?woUld BaKInG MY GRaPHics card fIX it? MultimETeR TeSTiNG!! aMd'S GpU DrIvErS aRe as goOD aS NviDia's YOU SHoUlD oVERCloCk yOUR ramS To 5000C18

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Levent said:

moved the adapter to the top pcie1x slot. I ran my cache worker non-stop overnight, I also did couple of terribly long disk benchmarks while scrubbing HDDs. It didnt crash at all.

Top one 100% is off the CPU btw and the bottom one is off the chipset. Sounds like it wasn't happy running off the chipset. Personally quite hopeful that it has fixed the issue.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×