Jump to content

Which kind of RAID should I use to not lose data, 1 or 5? Maybe 10?

Mngm

Hello, recently I had some problems with data loss and I was considering setting up a RAID to ensure to save myself from any further data corruption/loss/or whatever it is for really important files which a data loss or corruption is a complete no no, a dead or alive situation basically, and I don't care much about speed.  After some research about what is a RAID, and the possible configurations, here is what I come up to.

 

I would need a minimum of 3TB of storage at the moment, so I was thinking to buy 2 x 3TB HDD and set them to RAID1, so not having to worry to copy directly to the second hdd, but it will mirror itself and call it a day. (Normally I make external hdd backups storage, that's why I said "copy directly" but over time they can fail too)

 

Alternatevely, using RAID5, buying 3 x 2TB HDD as they are striped in one, and knowing that the "parity" process will check if data are gone bad, and replace it with a good one, right? Also the parity thing is across all 3 hdd, so each other looking at each other?

 

I would have prefered the RAID1, as the data is stored to 2 separate hdd storage, but I see that the RAID5 is just superior? How the parity checksums is gonna tell me it had to fix something? So having to manually check and solve problems to avoid a 2nd drive failure and lose data? What even means "Potential total RAID group data loss if second drive fails during rebuild"?

 

RAID6... more expensive that my wallet was looking at me with disgrace after considering RAID5, and RAID10 is able to take 2 failure as long as they are not in the same array, but it doesn't have the parity feature to "heal" itself?

 

I really wish that there would be a RAID"X" that could be so OP to make the others obsolete, like "RAID10 + parity" or something like that.

 

Thanks for reading and helping, and sorry for my bad english.

Link to comment
Share on other sites

Link to post
Share on other sites

I would recommend a Raid6 at minimum. In a Raid 6 you can have 2 drives fail, while in Raid 5 only 1 can fail. It's not uncommon for a second Drive to fail when rebuilding the array.

 

Personally I would go for a OpenZFS RAIDZ2 (Raid 6) with notifications setup, of course you would need a spare machine. TrueNas should be able to get you here, https://www.truenas.com/docs/core/system/email/

Link to comment
Share on other sites

Link to post
Share on other sites

RAID is NOT a backup! What you need is a backup plan, not necessarily a RAID. So, you need the following:

 

  • 3 copies of your data, in
  • 2 different locations, where at least
  • 1 is on a different storage medium then the others

As for your RAID solutions: you complain about RAID6 being expensive but want to use a RAID10 instead. Both use 4 drives, but RAID6 allows for ANY pair combination of the 4 drives to fail before data loss occurs, while RAID10, as you already stated, depends on which drives actually fail. IMO, any RAID option containing a 0 is bad for your data. Ask Linus, his first Whonnock failure had his RAID6 drives set up in a RAID0 :old-eyeroll: As for the actual drives, use 4TB models instead and make sure you have at least 2 manufacturers represented in the array (my RAID6 has drives from 4 different manufacturers).

 

If you want RAID, use RAID6. As a speed upgrade, add an SSD cache on top of that. A 1TB NVMe drive (at least PCIe3) is recommended, as these are the price/capacity sweet spot ATM.

"You don't need eyes to see, you need vision"

 

(Faithless, 'Reverence' from the 1996 Reverence album)

Link to comment
Share on other sites

Link to post
Share on other sites

I think the first thing that may need to be clarified is that RAID systems become the singular "disk" that you use; the drives within the RAID are no longer individually accessible. Any error correction will be handled by the [soft/hard]ware managing the RAID array and file system. You also can't pull one disk from a RAID1 array to use somewhere else without resynchronizing the whole disk once it's back. The array becomes one group from the perspective of the storage system.

 

Also, there is a distinction between "parity" data and "checksum" data. Parity is used as part of the redundancy (ensure data is available by having copies); checksum is part of the integrity (ensure data is accurate by making sure certain math functions return the same as a stored value). Parity is usually handled on the disk level (RAID); checksum is handled on the file system level (what bytes are written to the array). A hybrid filesystem like ZFS handles both of these at the same time.

 

When you hear of "scrubbing" data looking for bitrot, etc, it's looking at the checksums and parity. Parity confirms the hardware data is accurate between disks; checksums confirm the file hasn't had errant information written at the same location across the whole array (perhaps by bad code or a bitflip in RAM, something parity can't catch).

 

7 minutes ago, Mngm said:

Alternatevely, using RAID5, buying 3 x 2TB HDD as they are striped in one, and knowing that the "parity" process will check if data are gone bad, and replace it with a good one, right? Also the parity thing is across all 3 hdd, so each other looking at each other?

RAID5 will split any data it is to write, write half to disk(1) and disk(2), and then take disk(1)bit(n) and XOR it with disk(2)bit(n) to get disk(3)bit(n), etc. By virtue of the way XOR works, if you lose and one disk, you can logically recover the missing bits from the data on the two other disks.

 

There is a small CPU performance penalty because the parity information needs to be calculated (and checked on read), but because the data is split across two disks, you also get a near-double read performance improvement. Writes will be improved, but not necessarily as much because of CPU load.

 

As an aside, a 3x 2TB RAID5 array would yield 4TB of usable space (more efficient than RAID1). Depending what you're looking at 4x 1TB RAID5 would get you 3TB usable and a pretty fast array.

 

13 minutes ago, Mngm said:

RAID6... more expensive that my wallet was looking at me with disgrace after considering RAID5

RAID6 is just RAID5 with a second (inverse, XNOR) parity disk.

 

13 minutes ago, Mngm said:

RAID10 is able to take 2 failure as long as they are not in the same array, but it doesn't have the parity feature to "heal" itself?

RAID10 is stripes of mirrors (literally RAID1 + RAID0). You get redundancy protection of RAID1 with the speed improvements of RAID0. It can heal itself within each of its stripes, but there's no data regarding the other stripe, so if both disks die, the whole array is hosed.

 

18 minutes ago, Mngm said:

I really wish that there would be a RAID"X" that could be so OP to make the others obsolete, like "RAID10 + parity" or something like that.

May I present to you: RAID50! (stripes of parity arrays; fast, efficient, and redundant; also, complicated, poorly supported, and rare (by name)).

Each RAID level has its strengths and weaknesses, and a judicious user should do their research to find what suits their purposes the best, taking into account the performance of the devices controlling and accessing the data (no point in prioritizing read/write speed if you're limited by gigabit Ethernet, for example).

 

My server has a RAID6-equivalent array (I say equivalent because it's ZFS, which is slightly different, but conceptually the same), which I chose because it allows for any two drives to fail, unlike my previous RAID10 array in which two drives could fail, but they had to be in different stripes. I lose a little bit of read/write performance, but I gain peace of mind (since my server contains quite a bit of important things (that I have backed up elsewhere, of course)).

 

To get off into the weeds a little bit: ZFS lets you pool multiple arrays as stripes in a larger array, so you could in theory have any combination of RAID types (but there are a host of issues you can run into with this, including performance and redundancy bottlenecks).

 

46 minutes ago, Mngm said:

I would need a minimum of 3TB of storage at the moment, so I was thinking to buy 2 x 3TB HDD and set them to RAID1, so not having to worry to copy directly to the second hdd, but it will mirror itself and call it a day. (Normally I make external hdd backups storage, that's why I said "copy directly" but over time they can fail too)

For your purposes, I would recommend both RAID1 and manually copying a backup as often as is practical to a third disk. More copies are better, IMO, but having one copy slightly time-shifted means that if you accidentally delete or modify something, you can restore from the offline copy.

 

There are ways to have "journaled" files (keep previous snapshot versions for a set period of time), but that's a file system feature that's a little outside the scope of the original question (not that most of what I've written is 😉 ).

Main System (Byarlant): Ryzen 7 5800X | Asus B550-Creator ProArt | EK 240mm Basic AIO | 16GB G.Skill DDR4 3200MT/s CAS-14 | XFX Speedster SWFT 210 RX 6600 | Samsung 990 PRO 2TB / Samsung 960 PRO 512GB / 4× Crucial MX500 2TB (RAID-0) | Corsair RM750X | a 10G NIC (pending) | Inateck USB 3.0 Card | Hyte Y60 Case | Dell U3415W Monitor | Keychron K4 Brown (white backlight)

 

Laptop (Narrative): Lenovo Flex 5 81X20005US | Ryzen 5 4500U | 16GB RAM (soldered) | Vega 6 Graphics | SKHynix P31 1TB NVMe SSD | Intel AX200 Wifi (all-around awesome machine)

 

Proxmox Server (Veda): Ryzen 7 3800XT | AsRock Rack X470D4U | Corsair H80i v2 | 64GB Micron DDR4 ECC 3200MT/s | 4x 10TB WD Whites / 4x 14TB Seagate Exos / 2× Samsung PM963a 960GB SSD | Seasonic Prime Fanless 500W | Intel X540-T2 10G NIC | LSI 9207-8i HBA | Fractal Design Node 804 Case (side panels swapped to show off drives) | VMs: TrueNAS Scale; Ubuntu Server (PiHole/PiVPN/NGINX?); Windows 10 Pro; Ubuntu Server (Apache/MySQL)


Media Center/Video Capture (Jesta Cannon): Ryzen 5 1600X | ASRock B450M Pro4 R2.0 | Noctua NH-L12S | 16GB Crucial DDR4 3200MT/s CAS-22 | EVGA GTX750Ti SC | UMIS NVMe SSD 256GB / TEAMGROUP MS30 1TB | Corsair CX450M | Viewcast Osprey 260e Video Capture | Mellanox ConnectX-2 10G NIC | LG UH12NS30 BD-ROM | Silverstone Sugo SG-11 Case | Sony XR65A80K

 

Camera: Sony ɑ7II w/ Meike Grip | Sony SEL24240 | Samyang 35mm ƒ/2.8 | Sony SEL50F18F | Sony SEL2870 (kit lens) | PNY Elite Perfomance 512GB SDXC card

 

Network:

Spoiler
                           ┌─────────────── Office/Rack ────────────────────────────────────────────────────────────────────────────┐
Google Fiber Webpass ────── UniFi Security Gateway ─── UniFi Switch 8-60W ─┬─ UniFi Switch Flex XG ═╦═ Veda (Proxmox Virtual Switch)
(500Mbps↑/500Mbps↓)                             UniFi CloudKey Gen2 (PoE) ─┴─ Veda (IPMI)           ╠═ Veda-NAS (HW Passthrough NIC)
╔═══════════════════════════════════════════════════════════════════════════════════════════════════╩═ Narrative (Asus USB 2.5G NIC)
║ ┌────── Closet ──────┐   ┌─────────────── Bedroom ──────────────────────────────────────────────────────┐
╚═ UniFi Switch Flex XG ═╤═ UniFi Switch Flex XG ═╦═ Byarlant
   (PoE)                 │                        ╠═ Narrative (Cable Matters USB-PD 2.5G Ethernet Dongle)
                         │                        ╚═ Jesta Cannon*
                         │ ┌─────────────── Media Center ──────────────────────────────────┐
Notes:                   └─ UniFi Switch 8 ─────────┬─ UniFi Access Point nanoHD (PoE)
═══ is Multi-Gigabit                                ├─ Sony Playstation 4 
─── is Gigabit                                      ├─ Pioneer VSX-S520
* = cable passed to Bedroom from Media Center       ├─ Sony XR65A80K (Google TV)
** = cable passed from Media Center to Bedroom      └─ Work Laptop** (Startech USB-PD Dock)

Retired/Other:

Spoiler

Laptop (Rozen-Zulu): Sony VAIO VPCF13WFX | Core i7-740QM | 8GB Patriot DDR3 | GT 425M | Samsung 850EVO 250GB SSD | Blu-ray Drive | Intel 7260 Wifi (lived a good life, retired with honor)

Testbed/Old Desktop (Kshatriya): Xeon X5470 @ 4.0GHz | ZALMAN CNPS9500 | Gigabyte EP45-UD3L | 8GB Nanya DDR2 400MHz | XFX HD6870 DD | OCZ Vertex 3 Max-IOPS 120GB | Corsair CX430M | HooToo USB 3.0 PCIe Card | Osprey 230 Video Capture | NZXT H230 Case

TrueNAS Server (La Vie en Rose): Xeon E3-1241v3 | Supermicro X10SLL-F | Corsair H60 | 32GB Micron DDR3L ECC 1600MHz | 1x Kingston 16GB SSD / Crucial MX500 500GB

Link to comment
Share on other sites

Link to post
Share on other sites

19 minutes ago, Dutch_Master said:

If you want RAID, use RAID6. As a speed upgrade, add an SSD cache on top of that. A 1TB NVMe drive (at least PCIe3) is recommended, as these are the price/capacity sweet spot ATM.

We (as home users) need to stop recommending cache drives. The use cases that cache drives help tend to not be the cases where RAID mass storage is being used. Unless you're repeatedly randomly accessing the same multi-gigabyte dataset over and over, the raw disk speed of the array is generally enough to keep up with most home access methods. Furthermore, home users are not latency sensitive the way that a web- or database-server is, the few extra dozen milliseconds to seek for data is seldom perceptible.

 

This is especially true of ZFS L2ARC; ZFS already has a RAM cache. L2ARC will only help if the total size of the repeated randomly accessed data is larger than the size of the RAM cache, but home users don't generally run multi-gig database servers, so it's not usually an issue. It certainly doesn't help with media storage (I've tested it on my own server and it came back inconclusive i.e. no difference). Oh, and it's volatile, so every time you restart, the cache needs to be rebuilt from scratch.

Main System (Byarlant): Ryzen 7 5800X | Asus B550-Creator ProArt | EK 240mm Basic AIO | 16GB G.Skill DDR4 3200MT/s CAS-14 | XFX Speedster SWFT 210 RX 6600 | Samsung 990 PRO 2TB / Samsung 960 PRO 512GB / 4× Crucial MX500 2TB (RAID-0) | Corsair RM750X | a 10G NIC (pending) | Inateck USB 3.0 Card | Hyte Y60 Case | Dell U3415W Monitor | Keychron K4 Brown (white backlight)

 

Laptop (Narrative): Lenovo Flex 5 81X20005US | Ryzen 5 4500U | 16GB RAM (soldered) | Vega 6 Graphics | SKHynix P31 1TB NVMe SSD | Intel AX200 Wifi (all-around awesome machine)

 

Proxmox Server (Veda): Ryzen 7 3800XT | AsRock Rack X470D4U | Corsair H80i v2 | 64GB Micron DDR4 ECC 3200MT/s | 4x 10TB WD Whites / 4x 14TB Seagate Exos / 2× Samsung PM963a 960GB SSD | Seasonic Prime Fanless 500W | Intel X540-T2 10G NIC | LSI 9207-8i HBA | Fractal Design Node 804 Case (side panels swapped to show off drives) | VMs: TrueNAS Scale; Ubuntu Server (PiHole/PiVPN/NGINX?); Windows 10 Pro; Ubuntu Server (Apache/MySQL)


Media Center/Video Capture (Jesta Cannon): Ryzen 5 1600X | ASRock B450M Pro4 R2.0 | Noctua NH-L12S | 16GB Crucial DDR4 3200MT/s CAS-22 | EVGA GTX750Ti SC | UMIS NVMe SSD 256GB / TEAMGROUP MS30 1TB | Corsair CX450M | Viewcast Osprey 260e Video Capture | Mellanox ConnectX-2 10G NIC | LG UH12NS30 BD-ROM | Silverstone Sugo SG-11 Case | Sony XR65A80K

 

Camera: Sony ɑ7II w/ Meike Grip | Sony SEL24240 | Samyang 35mm ƒ/2.8 | Sony SEL50F18F | Sony SEL2870 (kit lens) | PNY Elite Perfomance 512GB SDXC card

 

Network:

Spoiler
                           ┌─────────────── Office/Rack ────────────────────────────────────────────────────────────────────────────┐
Google Fiber Webpass ────── UniFi Security Gateway ─── UniFi Switch 8-60W ─┬─ UniFi Switch Flex XG ═╦═ Veda (Proxmox Virtual Switch)
(500Mbps↑/500Mbps↓)                             UniFi CloudKey Gen2 (PoE) ─┴─ Veda (IPMI)           ╠═ Veda-NAS (HW Passthrough NIC)
╔═══════════════════════════════════════════════════════════════════════════════════════════════════╩═ Narrative (Asus USB 2.5G NIC)
║ ┌────── Closet ──────┐   ┌─────────────── Bedroom ──────────────────────────────────────────────────────┐
╚═ UniFi Switch Flex XG ═╤═ UniFi Switch Flex XG ═╦═ Byarlant
   (PoE)                 │                        ╠═ Narrative (Cable Matters USB-PD 2.5G Ethernet Dongle)
                         │                        ╚═ Jesta Cannon*
                         │ ┌─────────────── Media Center ──────────────────────────────────┐
Notes:                   └─ UniFi Switch 8 ─────────┬─ UniFi Access Point nanoHD (PoE)
═══ is Multi-Gigabit                                ├─ Sony Playstation 4 
─── is Gigabit                                      ├─ Pioneer VSX-S520
* = cable passed to Bedroom from Media Center       ├─ Sony XR65A80K (Google TV)
** = cable passed from Media Center to Bedroom      └─ Work Laptop** (Startech USB-PD Dock)

Retired/Other:

Spoiler

Laptop (Rozen-Zulu): Sony VAIO VPCF13WFX | Core i7-740QM | 8GB Patriot DDR3 | GT 425M | Samsung 850EVO 250GB SSD | Blu-ray Drive | Intel 7260 Wifi (lived a good life, retired with honor)

Testbed/Old Desktop (Kshatriya): Xeon X5470 @ 4.0GHz | ZALMAN CNPS9500 | Gigabyte EP45-UD3L | 8GB Nanya DDR2 400MHz | XFX HD6870 DD | OCZ Vertex 3 Max-IOPS 120GB | Corsair CX430M | HooToo USB 3.0 PCIe Card | Osprey 230 Video Capture | NZXT H230 Case

TrueNAS Server (La Vie en Rose): Xeon E3-1241v3 | Supermicro X10SLL-F | Corsair H60 | 32GB Micron DDR3L ECC 1600MHz | 1x Kingston 16GB SSD / Crucial MX500 500GB

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, AbydosOne said:

We (as home users) need to stop recommending cache drives.

Yes and no. I agree for home users it's not required, but they do see it on various channels so they know it's possible. And that makes them "want" it. So I included/mentioned it as an "upgrade" 😉

"You don't need eyes to see, you need vision"

 

(Faithless, 'Reverence' from the 1996 Reverence album)

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Mngm said:

I would need a minimum of 3TB of storage at the moment, so I was thinking to buy 2 x 3TB HDD and set them to RAID1

 

Get a single 3TB drive and use the second 3TB drive as a backup. USB external drives or enclosures are stupid cheap. Powerful backup software like Macrium Reflect is free. Cloud storage like backblaze costs a couple bucks a month. This is not a technical question. You are going down the wrong path.

Get a decent smart UPS for your computer since hard power faults are a leading cause of data corruption, not hard drives.

 

If you have budget left over then implement RAID 1. Regardless, having proper backups trumps RAID talk which is rapidly becoming as out dated as floppy drives and token ring. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Dutch_Master said:

RAID is NOT a backup! What you need is a backup plan, not necessarily a RAID. So, you need the following:

 

  • 3 copies of your data, in
  • 2 different locations, where at least
  • 1 is on a different storage medium then the others

As for your RAID solutions: you complain about RAID6 being expensive but want to use a RAID10 instead. Both use 4 drives, but RAID6 allows for ANY pair combination of the 4 drives to fail before data loss occurs, while RAID10, as you already stated, depends on which drives actually fail. IMO, any RAID option containing a 0 is bad for your data. Ask Linus, his first Whonnock failure had his RAID6 drives set up in a RAID0 :old-eyeroll: As for the actual drives, use 4TB models instead and make sure you have at least 2 manufacturers represented in the array (my RAID6 has drives from 4 different manufacturers).

 

If you want RAID, use RAID6. As a speed upgrade, add an SSD cache on top of that. A 1TB NVMe drive (at least PCIe3) is recommended, as these are the price/capacity sweet spot ATM.

Thanks for the reply, I was thinking to specify that I know a RAID is not a backup, as I haven't said I wanted a RAID as a "backup" but to be the active storage of the pc, where I can use the files when needed unlike external hdd on a shelf (shelf like an example).

I also said I made external hdd backups altough I have to copy directly the new files to the hdd backups. And that brings the "3 copies on 2 locations..." yeah, wasn't Linus said something like that long ago? 

But thanks for the reminder, some people can get confused about that.

 

Sorry if it sounded like a complain wanting to use RAID10 instead of RAID6, but I should have specify that just because I would want to use the best option for the situation, wanting isn't the same as can doing, RAID1 is less expensive, but unsuitable for my needs, so it's pointless spending less if it's useless, or spending more for something no needed, If people said RAID5 isn't the right choice and had to use the 10, then it would have been a reason to actually consider this option aside the cost. There was also some manufactures saying their cd or something like that will last 10 years or even more as backups than regualr hdd, which is good, but then again is too expensive to actually doing it even if I wanted.

 

I am using the Fractal Design Meshify 2, which allows 3 HDD slot on the back, that's why I didn't consider 4 or more hdds, I would like to avoid if possible for now these hdd slot machine, and use the case directly. Should have specify that too tho lol, and using ssd isn't cheaper even if I wanted, and yeah, the case offers other locations for a hdd, but I would focus on using the ones in the back, while instead of adding more, if buy further external hdd backups.

 

Not sure if using 4TB hdd is gonna be cheaper without some deals, but will check and see, and will buy from different manufactures.

 

I don't need at the moment a speed upgrade, but will consider if it becomes too slow to work properly, thanks for the help!

 

PS: just to make sure I specify at least one thing, I'm not mad or something 😂

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, AbydosOne said:

I think the first thing that may need to be clarified is that RAID systems become the singular "disk" that you use; the drives within the RAID are no longer individually accessible. Any error correction will be handled by the [soft/hard]ware managing the RAID array and file system. You also can't pull one disk from a RAID1 array to use somewhere else without resynchronizing the whole disk once it's back. The array becomes one group from the perspective of the storage system.

 

Also, there is a distinction between "parity" data and "checksum" data. Parity is used as part of the redundancy (ensure data is available by having copies); checksum is part of the integrity (ensure data is accurate by making sure certain math functions return the same as a stored value). Parity is usually handled on the disk level (RAID); checksum is handled on the file system level (what bytes are written to the array). A hybrid filesystem like ZFS handles both of these at the same time.

 

When you hear of "scrubbing" data looking for bitrot, etc, it's looking at the checksums and parity. Parity confirms the hardware data is accurate between disks; checksums confirm the file hasn't had errant information written at the same location across the whole array (perhaps by bad code or a bitflip in RAM, something parity can't catch).

 

RAID5 will split any data it is to write, write half to disk(1) and disk(2), and then take disk(1)bit(n) and XOR it with disk(2)bit(n) to get disk(3)bit(n), etc. By virtue of the way XOR works, if you lose and one disk, you can logically recover the missing bits from the data on the two other disks.

 

There is a small CPU performance penalty because the parity information needs to be calculated (and checked on read), but because the data is split across two disks, you also get a near-double read performance improvement. Writes will be improved, but not necessarily as much because of CPU load.

 

As an aside, a 3x 2TB RAID5 array would yield 4TB of usable space (more efficient than RAID1). Depending what you're looking at 4x 1TB RAID5 would get you 3TB usable and a pretty fast array.

 

RAID6 is just RAID5 with a second (inverse, XNOR) parity disk.

 

RAID10 is stripes of mirrors (literally RAID1 + RAID0). You get redundancy protection of RAID1 with the speed improvements of RAID0. It can heal itself within each of its stripes, but there's no data regarding the other stripe, so if both disks die, the whole array is hosed.

 

May I present to you: RAID50! (stripes of parity arrays; fast, efficient, and redundant; also, complicated, poorly supported, and rare (by name)).

Each RAID level has its strengths and weaknesses, and a judicious user should do their research to find what suits their purposes the best, taking into account the performance of the devices controlling and accessing the data (no point in prioritizing read/write speed if you're limited by gigabit Ethernet, for example).

 

My server has a RAID6-equivalent array (I say equivalent because it's ZFS, which is slightly different, but conceptually the same), which I chose because it allows for any two drives to fail, unlike my previous RAID10 array in which two drives could fail, but they had to be in different stripes. I lose a little bit of read/write performance, but I gain peace of mind (since my server contains quite a bit of important things (that I have backed up elsewhere, of course)).

 

To get off into the weeds a little bit: ZFS lets you pool multiple arrays as stripes in a larger array, so you could in theory have any combination of RAID types (but there are a host of issues you can run into with this, including performance and redundancy bottlenecks).

 

For your purposes, I would recommend both RAID1 and manually copying a backup as often as is practical to a third disk. More copies are better, IMO, but having one copy slightly time-shifted means that if you accidentally delete or modify something, you can restore from the offline copy.

 

There are ways to have "journaled" files (keep previous snapshot versions for a set period of time), but that's a file system feature that's a little outside the scope of the original question (not that most of what I've written is 😉 ).

Thanks for the reply! I wouldn't try to remove a hdd to use it somewhere else, even more if it had problems to begin with. Raid1 was looking good, cheaper than others, makes a duplicate so not manually doing it, and having a secondary hdd if anything happens to one, but apparently it have some drawbacks.

 

I see, I wrongly understood that "parity checksums" is the process of the "parity" setting, like the "parity does a parity checksums..." better later than never.

 

Definitly not liking the cpu penalty, but I can't have all without giving something, and thanks for the whole detailed explaination.

 

RAID50 eh? definitly waiting for RAID50 v2.0 relase lol. I was about to consider the ZFS "mode", but like you said, all of them have advantages and weaknesses.

 

I'll take all the help I got and will plan something to see if it's better to go RAID1 or RAID5, and after some time probably go for a RAID6 using a NAS, instead of the case slots.

 

Even if the "journaled" files wasn't on topic, I will consider that tip for other things too, thanks for the help!

 

EDIT: I also forgot to say, that luckily I do make some external backups just in case (after losing so many cellphone files, photo ecc... I learnt my lesson), so the RAID will become the active storage, while I definitly will backup sometimes to external hdd (despite they can fail too)

Edited by Mngm
More info
Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, wseaton said:

 

Get a single 3TB drive and use the second 3TB drive as a backup. USB external drives or enclosures are stupid cheap. Powerful backup software like Macrium Reflect is free. Cloud storage like backblaze costs a couple bucks a month. This is not a technical question. You are going down the wrong path.

Get a decent smart UPS for your computer since hard power faults are a leading cause of data corruption, not hard drives.

 

If you have budget left over then implement RAID 1. Regardless, having proper backups trumps RAID talk which is rapidly becoming as out dated as floppy drives and token ring. 

 

Thanks for the reply! I have some external hdd were I make backups, but would have liked to have the RAID as the active storage, and don't worry about data loss before having a chance to bakeup the new files.

 

Will check the software you mentioned, thanks!

 

Cloud storage isn't my thing, furthermore I really want to avoid having important files on somewhere else storage/pc, or service being unavailable or them losing the data, and in a long term, I would spend less and still keeping the storage space.

 

Lukily there wasn't any  power faults, but we can never be 100% sure, will consider but after some time, as decent UPS ain't cheap unfortunately.

 

If I had a ton of free money, I would just buy 2 NAS and make one RAID1 and the other RAID5/6, while having also one using the pc case, but it's quite uneccesary and just spend on more different external storage options. Thanks for the help!

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, Nayr438 said:

I would recommend a Raid6 at minimum. In a Raid 6 you can have 2 drives fail, while in Raid 5 only 1 can fail. It's not uncommon for a second Drive to fail when rebuilding the array.

 

Personally I would go for a OpenZFS RAIDZ2 (Raid 6) with notifications setup, of course you would need a spare machine. TrueNas should be able to get you here, https://www.truenas.com/docs/core/system/email/

Thanks for the reply, It's a good thing notifications exist, but I don't have a spare machine like a NAS, but in the future, if possible, I will consider that plan too. Thanks for the help!

Link to comment
Share on other sites

Link to post
Share on other sites

RAID 6 shouldn't even be in the discussion because it requires 5 physical drives to justify itself over 4 drives used in a simple RAID 1/10 mirror. RAID 10 is just a group of RAID 1 drives. 

 

With 3 drive bays RAID 5 is the optimal solution for getting the most storage while still having redundancy if a drive fails. However, I will take a single solo drive and reliable external backups anyday over any RAID config. Then a smart UPS (cyberpower is my goto budget suggestion for smart shutdown) then RAID. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, wseaton said:

RAID 6 shouldn't even be in the discussion because it requires 5 physical drives to justify itself over 4 drives used in a simple RAID 1/10 mirror. RAID 10 is just a group of RAID 1 drives. 

 

With 3 drive bays RAID 5 is the optimal solution for getting the most storage while still having redundancy if a drive fails. However, I will take a single solo drive and reliable external backups anyday over any RAID config. Then a smart UPS (cyberpower is my goto budget suggestion for smart shutdown) then RAID. 

Alright, thanks! Definitly for only 3TB of data, it's not so wise to have 10 drives on a RAID, just a good RAID5, and some external backups, and if I'll have to need far more space (and having more money...) then I would consider a NAS, and not just rely on the case and some external backups.

 

I'll check that PSU brand, thanks for the info!

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, wseaton said:

RAID 6 shouldn't even be in the discussion because it requires 5 physical drives to justify itself over 4 drives used in a simple RAID 1/10 mirror.

Um, actually, no. RAID6 requires a minimum of 4 drives (not 5!) and as explained above, uses bit-level striped X(N)OR logic to make recovery possible. There is no logic in either a RAID0 or RAID1, never mind a RAID10. People only (want to) see the speed benefit of RAID10 over an equally expensive RAID6, but conveniently forget that any array level with a 0 in it is bound to lead to data loss.

 

RAID5 is an acceptable alternative if constraints (of any sort) limit the number of drives to 3. It does raise the risk for data loss, but not by a large amount over RAID6. I've seen some web page offering a risk calculator for various levels of RAID over a period of time, but didn't bookmark it and now I can't find it anymore 😞

"You don't need eyes to see, you need vision"

 

(Faithless, 'Reverence' from the 1996 Reverence album)

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Dutch_Master said:

Um, actually, no. RAID6 requires a minimum of 4 drives (not 5!) and as explained above, uses bit-level striped X(N)OR logic to make recovery possible. There is no logic in either a RAID0 or RAID1, never mind a RAID10. People only (want to) see the speed benefit of RAID10 over an equally expensive RAID6, but conveniently forget that any array level with a 0 in it is bound to lead to data loss.

 

RAID5 is an acceptable alternative if constraints (of any sort) limit the number of drives to 3. It does raise the risk for data loss, but not by a large amount over RAID6. I've seen some web page offering a risk calculator for various levels of RAID over a period of time, but didn't bookmark it and now I can't find it anymore 😞

I didn't say "require". I said ".Justify". 

 

RAID 6 needs 5 drives before it stops incurring a RAID storage penalty over RAID 1/10. Use an online RAID calculator if you don't believe it. The additional double parity of RAID 6 incurs a storage cost that doesn't break even until you hit 5 drives. 5 drives plus hot swap is the sweet spot for RAID 6....years ago. Not like I haven't set that up a zillion times. 

 

RAID 5 is pretty simple: 3 drives store the same data and have one drive redundancy vs 4 drives of RAID 1/10. That's the whole reason RAID parity was invented by IBM back in the 90's. Because 9GB SCSI drives cost $1500 and saving 25% storage costs over basic RAID 1 made sense. This is not 1998 and storage is cheap. Frankly my IQ drops every time I discourage using parity based RAID unless you have a really large amount of physical drives to spread the parity stripe over and a robust controller, or multiple heads with very good monitoring. I've had RAID 5 blow up on AS400's, Netware boxes, countless SAN's (thank EMC and their sh^tty thermal control for being the worst SAN in the universe in this respect). I would rather castrate myself with blunt rocks than set up RAID 5 on a low end BL240 or 100/300 series Dell PERC. RAID 6, maybe. It's like having an air bag in car crash vs RAID 5. I would rather avoid the car crash. RAID parity is a crude form of non lossy data compression if you think about it, and if your controller fails or hiccups during a long write of the parity stripe you are boned. RAID 6 at least forces the controller to recalculate a parity calculation fault and hence it's increased robustness over RAID 5. I literally could not list the number of catastrophic RAID 5 failures I've dealt with ranging from mom and pop businesses to multi billion dollar regional banks. I refuse to install it on hardware controllers.

 

The reason software based RAID 5 is more reliable than controller based RAID 5 is the former uses your CPU and hefty amounts of data integrity protection to insure the parity calculation goes without fault. Hardware RAID controllers on the other end often use battery packs that catch fire, cheap DRAM modules, poor thermal management, and other unreliable trash hardware that's corrupts parity calculation if the wind blows the wrong way.  

 

As for speed, I've iOP'd the death our of different RAID topographies on all kinds of controllers and SANs, and it's not that clear cut. Striping patterns, LUNs, etc all have a huge impact on this along with drive geometries and space. RAID 0/10 used to be the speed demon because it created a stripe first and them mirrored the stripe, but I haven't seen that in over a decade. It's not even an option on most controllers now. RAID 10 on the other hand isn't consistent in how data is spread so it's performance varies greatly.

 

The OP wants data integrity, and that's what I'm telling him to do. I've never lost data for a client if I configure the storage because simplicity wins and this isn't 1998. Get a single drive, get good backups, get a UPS to prevent hard power faults, then use the simpliest RAID set when all the former conditions are met. Want to know why we see so mnay complaints about SSD errors here? Because they are using consumer SSD's with no power fault protection. Basic data center grade SSDs have this, and this is why I've never lost one. Not responsible for advice not taken.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, wseaton said:

The additional double parity of RAID 6 incurs a storage cost that doesn't break even until you hit 5 drives.

image.png.15991e5c082b90d14ae126afb20489df.png

 

image.png.8deaf4792872aa1d687b7f697c8da06e.png

 

Does it now? I've tried the top five RAID calculators from Google, and they all show RAID10 = RAID6 (~= RAIDZ2). Parity data does not take up usable space. Checksum data would, but that's not a standard RAID feature. And FWIW, that was my experience with RAIDZ2, so I'm not at all surprised by these calculations.

 

1 hour ago, wseaton said:

It's like having an air bag in car crash vs RAID 5. I would rather avoid the car crash.

As I've learned regarding cars and hard drives in the last year: "It's not a matter of 'if' but 'when'."

 

1 hour ago, wseaton said:

OP wants data integrity

I do agree that backups are a better option overall, and maybe what OP's looking for would be better suited for a ZFS array with routine scrubs. It's not necessarily a bad thing to want RAID for uptime assurances, though.

 

1 hour ago, wseaton said:

Want to know why we see so mnay complaints about SSD errors here?

I've been here a while, and I can't recall any issue that was directly tied back to SSD power loss. Honestly, RAM issues are what I would expect to cause most of people's data corruption issues (as I've personally experienced) with this widespread desire to run RAM at the bleeding edge of speed/timings these days.

Main System (Byarlant): Ryzen 7 5800X | Asus B550-Creator ProArt | EK 240mm Basic AIO | 16GB G.Skill DDR4 3200MT/s CAS-14 | XFX Speedster SWFT 210 RX 6600 | Samsung 990 PRO 2TB / Samsung 960 PRO 512GB / 4× Crucial MX500 2TB (RAID-0) | Corsair RM750X | a 10G NIC (pending) | Inateck USB 3.0 Card | Hyte Y60 Case | Dell U3415W Monitor | Keychron K4 Brown (white backlight)

 

Laptop (Narrative): Lenovo Flex 5 81X20005US | Ryzen 5 4500U | 16GB RAM (soldered) | Vega 6 Graphics | SKHynix P31 1TB NVMe SSD | Intel AX200 Wifi (all-around awesome machine)

 

Proxmox Server (Veda): Ryzen 7 3800XT | AsRock Rack X470D4U | Corsair H80i v2 | 64GB Micron DDR4 ECC 3200MT/s | 4x 10TB WD Whites / 4x 14TB Seagate Exos / 2× Samsung PM963a 960GB SSD | Seasonic Prime Fanless 500W | Intel X540-T2 10G NIC | LSI 9207-8i HBA | Fractal Design Node 804 Case (side panels swapped to show off drives) | VMs: TrueNAS Scale; Ubuntu Server (PiHole/PiVPN/NGINX?); Windows 10 Pro; Ubuntu Server (Apache/MySQL)


Media Center/Video Capture (Jesta Cannon): Ryzen 5 1600X | ASRock B450M Pro4 R2.0 | Noctua NH-L12S | 16GB Crucial DDR4 3200MT/s CAS-22 | EVGA GTX750Ti SC | UMIS NVMe SSD 256GB / TEAMGROUP MS30 1TB | Corsair CX450M | Viewcast Osprey 260e Video Capture | Mellanox ConnectX-2 10G NIC | LG UH12NS30 BD-ROM | Silverstone Sugo SG-11 Case | Sony XR65A80K

 

Camera: Sony ɑ7II w/ Meike Grip | Sony SEL24240 | Samyang 35mm ƒ/2.8 | Sony SEL50F18F | Sony SEL2870 (kit lens) | PNY Elite Perfomance 512GB SDXC card

 

Network:

Spoiler
                           ┌─────────────── Office/Rack ────────────────────────────────────────────────────────────────────────────┐
Google Fiber Webpass ────── UniFi Security Gateway ─── UniFi Switch 8-60W ─┬─ UniFi Switch Flex XG ═╦═ Veda (Proxmox Virtual Switch)
(500Mbps↑/500Mbps↓)                             UniFi CloudKey Gen2 (PoE) ─┴─ Veda (IPMI)           ╠═ Veda-NAS (HW Passthrough NIC)
╔═══════════════════════════════════════════════════════════════════════════════════════════════════╩═ Narrative (Asus USB 2.5G NIC)
║ ┌────── Closet ──────┐   ┌─────────────── Bedroom ──────────────────────────────────────────────────────┐
╚═ UniFi Switch Flex XG ═╤═ UniFi Switch Flex XG ═╦═ Byarlant
   (PoE)                 │                        ╠═ Narrative (Cable Matters USB-PD 2.5G Ethernet Dongle)
                         │                        ╚═ Jesta Cannon*
                         │ ┌─────────────── Media Center ──────────────────────────────────┐
Notes:                   └─ UniFi Switch 8 ─────────┬─ UniFi Access Point nanoHD (PoE)
═══ is Multi-Gigabit                                ├─ Sony Playstation 4 
─── is Gigabit                                      ├─ Pioneer VSX-S520
* = cable passed to Bedroom from Media Center       ├─ Sony XR65A80K (Google TV)
** = cable passed from Media Center to Bedroom      └─ Work Laptop** (Startech USB-PD Dock)

Retired/Other:

Spoiler

Laptop (Rozen-Zulu): Sony VAIO VPCF13WFX | Core i7-740QM | 8GB Patriot DDR3 | GT 425M | Samsung 850EVO 250GB SSD | Blu-ray Drive | Intel 7260 Wifi (lived a good life, retired with honor)

Testbed/Old Desktop (Kshatriya): Xeon X5470 @ 4.0GHz | ZALMAN CNPS9500 | Gigabyte EP45-UD3L | 8GB Nanya DDR2 400MHz | XFX HD6870 DD | OCZ Vertex 3 Max-IOPS 120GB | Corsair CX430M | HooToo USB 3.0 PCIe Card | Osprey 230 Video Capture | NZXT H230 Case

TrueNAS Server (La Vie en Rose): Xeon E3-1241v3 | Supermicro X10SLL-F | Corsair H60 | 32GB Micron DDR3L ECC 1600MHz | 1x Kingston 16GB SSD / Crucial MX500 500GB

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×