Jump to content

Should I redo my server setup?

Go to solution Solved by m9x3mos,
3 hours ago, LIGISTX said:

Truenas 100% does support SMART… I am not sure why you think it doesn’t?

 

If you are getting checksum errors…. You have a failing drive, bad RAM, or some hardware issue somewhere. You shouldn’t switch away from TrueNAS because it’s telling you you have issues, you should investigate the problem and fix it. TrueNAS is keeping your data healthy… don’t ignore data corruption issues, that indicates you have a hardware problem you need to solve. 
 

ZFS doesn’t require ECC. It’s a nice to have, but not needed. 
 

In a year or two (hopefully) you will be able to add drives to RAID Z vdevs to expand their storage size, but this is not in TrueNAS yet… it is in upstream builds of ZFS if I remember correct, but it’ll likely be a little while until TrueNAS incorporates it. But, thankfully, it does finally look like it’s actually on the horizon. 
 

But, by far the most important is that you are getting checksum errors. You need to figure out what is causing this and fix it. Either you have bad drives, bad controller, bad cables, or bad RAM. ZFS doesn’t throw errors for fun, ZFS is incredibly stable and is used in enterprise… if it’s reporting issues, do not ignore them. 

I agree, it does so smart. It might have something to do with your setup if it isn't working. Particularly if it is visualized. 

Ecc is also just a nice to have. Been running without it at home for over 6 years and I haven't had any problems. 

 

Do not over clock when using a storage solution. I had issues when testing if pbo was enabled. After turning that off it has been perfect. 

Disable over clocks and check your hardware and cables. 

Hi LTT Form.
I came and seek the wisdom of this community again.
I recently just built a home server out of curiosity and wanted to experiment with different tech.
 
Here are my hardware specs:
CPU: AMD 8700G
GPU: Gigabyte RTX 4060 Ti (The only one I found that can fit in the case, also powerful enough for my use case)
Motherboard: msi mpg b650i edge wifi
Memory: G.Skilll Flare X5 Series DDR5 32GB *2
PSU: Cooler Master V850 SFX Gold
Bootdrive: Samsung 980 Pro 1TB
Hard Drive: Seagate IronWolf Pro 12 TB
Case: JONSBO N3
 
The software I am running/ planning to run:
Proxmox as the hypervisor OS
TrueNAS (Thinking about switching to Unraid)
Ubuntu 22.04 Server
Windows 11 pro.
 
Here are my use cases:
Truenas (or Unraid) for storage. I am running RAIDZ2 right now and don't like the expandability TrueNAS offers.
Ubuntu to run pihole, vpn service, website hosting, and probably some others.
Windows 11 Pro (passthrough GPU to this) runs Plex and games to stream to my Steam deck via a Steam link or directly display on TV.
 
Here are the problems I am currently facing:
  1. TrueNAS cannot perform SMART scans because some instructions are missing.
  2. TrueNAS gives about 20 checksum errors on all drives each day.
 
Here are my thoughts on what may cause errors:
  1. I am not running ECC memory. I've read many posts saying the importance of ECC to ZFS, but there are also a lot of posts saying it's not an issue.
  2. I turned AMD expo on, and the overclock is causing the error bits.
 
I want to fix these issues as well as maybe switch to Unraid as it offers to expand my storage drive by drive without TrueNAS's pool and dataset complications. Any idea if ECC is actually necessary? I don't mind redoing my build, but I want to avoid it if AMD Expo is the primary issue. I also suspect it's because I am booting using bios instead of UEFI. Will that be an issue?
Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, MikeZ18 said:
  • TrueNAS cannot perform SMART scans because some instructions are missing.
  • TrueNAS gives about 20 checksum errors on all drives each day.

In TrueNAS, although having the package smartmontools installed out-of-the-box, the executable smartctl does weirdly not exist, and therefore cannot be run manually. Alternatively, you may dig into the kernel log by sudo-ing journalctl | grep "smartd" | grep "Any_SMART_Attribute"; each time any changes of SMART attributes take place, a report will be recorded in that log. I checked out my server's logs by running journalctl | grep "smartd" | grep "CRC", and found no reports; this means no CRC checksum errors have ever been reported since first boot-up of TrueNAS (about 2 months ago), albeit without ECC functionality. (Specs: Pentium G4600 with 8 GB of non-ECC RAM)

Also, the DDR5 "non-ECC" kits actually have some forms of on-die ECC, and would therefore not be the issue here.

 

Your findings may, therefore, indicate other hardware issues such as, most likely, bad cables.🤔

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, MikeZ18 said:
Hi LTT Form.
I came and seek the wisdom of this community again.
I recently just built a home server out of curiosity and wanted to experiment with different tech.
 
Here are my hardware specs:
CPU: AMD 8700G
GPU: Gigabyte RTX 4060 Ti (The only one I found that can fit in the case, also powerful enough for my use case)
Motherboard: msi mpg b650i edge wifi
Memory: G.Skilll Flare X5 Series DDR5 32GB *2
PSU: Cooler Master V850 SFX Gold
Bootdrive: Samsung 980 Pro 1TB
Hard Drive: Seagate IronWolf Pro 12 TB
Case: JONSBO N3
 
The software I am running/ planning to run:
Proxmox as the hypervisor OS
TrueNAS (Thinking about switching to Unraid)
Ubuntu 22.04 Server
Windows 11 pro.
 
Here are my use cases:
Truenas (or Unraid) for storage. I am running RAIDZ2 right now and don't like the expandability TrueNAS offers.
Ubuntu to run pihole, vpn service, website hosting, and probably some others.
Windows 11 Pro (passthrough GPU to this) runs Plex and games to stream to my Steam deck via a Steam link or directly display on TV.
 
Here are the problems I am currently facing:
  1. TrueNAS cannot perform SMART scans because some instructions are missing.
  2. TrueNAS gives about 20 checksum errors on all drives each day.
 
Here are my thoughts on what may cause errors:
  1. I am not running ECC memory. I've read many posts saying the importance of ECC to ZFS, but there are also a lot of posts saying it's not an issue.
  2. I turned AMD expo on, and the overclock is causing the error bits.
 
I want to fix these issues as well as maybe switch to Unraid as it offers to expand my storage drive by drive without TrueNAS's pool and dataset complications. Any idea if ECC is actually necessary? I don't mind redoing my build, but I want to avoid it if AMD Expo is the primary issue. I also suspect it's because I am booting using bios instead of UEFI. Will that be an issue?

Truenas 100% does support SMART… I am not sure why you think it doesn’t?

 

If you are getting checksum errors…. You have a failing drive, bad RAM, or some hardware issue somewhere. You shouldn’t switch away from TrueNAS because it’s telling you you have issues, you should investigate the problem and fix it. TrueNAS is keeping your data healthy… don’t ignore data corruption issues, that indicates you have a hardware problem you need to solve. 
 

ZFS doesn’t require ECC. It’s a nice to have, but not needed. 
 

In a year or two (hopefully) you will be able to add drives to RAID Z vdevs to expand their storage size, but this is not in TrueNAS yet… it is in upstream builds of ZFS if I remember correct, but it’ll likely be a little while until TrueNAS incorporates it. But, thankfully, it does finally look like it’s actually on the horizon. 
 

But, by far the most important is that you are getting checksum errors. You need to figure out what is causing this and fix it. Either you have bad drives, bad controller, bad cables, or bad RAM. ZFS doesn’t throw errors for fun, ZFS is incredibly stable and is used in enterprise… if it’s reporting issues, do not ignore them. 

Rig: i7 13700k - - Asus Z790-P Wifi - - RTX 4080 - - 4x16GB 6000MHz - - Samsung 990 Pro 2TB NVMe Boot + Main Programs - - Assorted SATA SSD's for Photo Work - - Corsair RM850x - - Sound BlasterX EA-5 - - Corsair XC8 JTC Edition - - Corsair GPU Full Cover GPU Block - - XT45 X-Flow 420 + UT60 280 rads - - EK XRES RGB PWM - - Fractal Define S2 - - Acer Predator X34 -- Logitech G502 - - Logitech G710+ - - Logitech Z5500 - - LTT Deskpad

 

Headphones/amp/dac: Schiit Lyr 3 - - Fostex TR-X00 - - Sennheiser HD 6xx

 

Homelab/ Media Server: Proxmox VE host - - 512 NVMe Samsung 980 RAID Z1 for VM's/Proxmox boot - - Xeon e5 2660 V4- - Supermicro X10SRF-i - - 128 GB ECC 2133 - - 10x4 TB WD Red RAID Z2 - - Corsair 750D - - Corsair RM650i - - Dell H310 6Gbps SAS HBA - - Intel RES2SC240 SAS Expander - - TreuNAS + many other VM’s

 

iPhone 14 Pro - 2018 MacBook Air

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, LIGISTX said:

Truenas 100% does support SMART… I am not sure why you think it doesn’t?

 

If you are getting checksum errors…. You have a failing drive, bad RAM, or some hardware issue somewhere. You shouldn’t switch away from TrueNAS because it’s telling you you have issues, you should investigate the problem and fix it. TrueNAS is keeping your data healthy… don’t ignore data corruption issues, that indicates you have a hardware problem you need to solve. 
 

ZFS doesn’t require ECC. It’s a nice to have, but not needed. 
 

In a year or two (hopefully) you will be able to add drives to RAID Z vdevs to expand their storage size, but this is not in TrueNAS yet… it is in upstream builds of ZFS if I remember correct, but it’ll likely be a little while until TrueNAS incorporates it. But, thankfully, it does finally look like it’s actually on the horizon. 
 

But, by far the most important is that you are getting checksum errors. You need to figure out what is causing this and fix it. Either you have bad drives, bad controller, bad cables, or bad RAM. ZFS doesn’t throw errors for fun, ZFS is incredibly stable and is used in enterprise… if it’s reporting issues, do not ignore them. 

I agree, it does so smart. It might have something to do with your setup if it isn't working. Particularly if it is visualized. 

Ecc is also just a nice to have. Been running without it at home for over 6 years and I haven't had any problems. 

 

Do not over clock when using a storage solution. I had issues when testing if pbo was enabled. After turning that off it has been perfect. 

Disable over clocks and check your hardware and cables. 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×