Jump to content

NAS and SAN, NOT for home users and small businesses...

Go to solution Solved by leadeater,

Hmmm,

 

I see a lot of interest in this topic, because of the many responses. I consider that a good thing.

 

My reason for starting this thread is to encourage a synergy of hardware and software people to fix what is apparent (to me, and only me) in the very process of rebuilding a drive can cause another drive to fail, thus ending the rebuild process.

 

There are more than one 'bad' guy in this flawed process. It is NOT my opinion. it is a fact, based on the stories I have heard.

 

One of the ways to fix, is to avoid hard drives - use only SSD (or something yet to be invented). Problem? SSD don't have the capacity, as yet and are expensive. And the way life is, a new problem will surface. Technical progress tends to take care of problems. But if no one thinks there is a problem, no one will be working on the problem.

 

I am old enough to remember when the fastest storage device on a computer was a 'drum'. It rotation speed was about 11 milliseconds, and that meant its access time was 11 ms. Far faster than the hard drives of that day.

 

Now, I can have a faster drive in my pc, and much larger, too.

 

Raid was created to deal with slow and unreliable hard drives. Adding more options to the raid 'equation' did not fix the drives. But progress in the hard drive arena did. Raid has its uses, and also its failures. ZFS, as far as I can tell, demands more of the hard drive than it can actually provide. Recognition of that is in the resilver process. However, that process should (until a better solution comes along) be gentle in the rebuild process. After fixing one drive, it will have to fix another drive, and so on. Beating a drive to death in this process is foolish. So a better way is needed.

 

Best regards

 

On RAID cards and ZFS you can set the rebuild/resilver priority. ZFS has even better controls and limits on this process than hardware RAID does.

 

* Prioritize resilvering by setting the delay
set zfs:zfs_resilver_delay
 
* Prioritize scrubs by setting the delay
set zfs:zfs_scrub_delay
 
* set maximum number of in flight IOs to a reasonable value - this number will vary for your environment (not a rebuild property but useful to know about)
set zfs:zfs_top_maxinflight
 
* resilver for milliseconds per TXG
set zfs:zfs_resilver_min_time_ms
 
All these settings can be altered to insure as fast/demanding or slow/gentle as you wish. There is always a trade off between the rebuild speed to restore back to a healthy state versus stress induced on the system to do so. Knowing what these settings do is extremely important, even if one does not intend to change them.
 
This is definitely a case of find good articles on performance tuning ZFS systems and read the manual. This particular question about resilvering has been asked many times and although answered it generally isn't in a very good digestible way or written in the form of an educational editorial for others to learn from. There are way too many getting started guides or install guides online for FreeNAS but no real good authoritative source for all the things you need to know and understand before going in to a FreeNAS build. All the information is available and can be found but takes time.
 
Most of the long time storage experts don't need this kind of material and also generally cannot be bothered to write it as they have no use for it themselves.
 
I think the general issue that I see some people have with your original post and premise coming in to it is that ZFS etc is flawed and cannot be fixed. If the topic had posed the question "These are the issues I have identified, does anyone know of anything that can fix or mitigate the issue right now or know of anything coming in the near future" then this topic would have become much more constructive and educational much sooner rather than the current ideological battle between two differing view points which can both be correct and also incorrect. It is a debate that is doomed to circle continuously with no resolution.
 
"Keep it simple"
"It will always fail"
"If you think something is not possible, it probably is"
 
These three mantras apply to almost everything in life and also in IT.

Hello,

 

I have been 'exploring' the use of a SAN or NAS in my home for a while, and have come to the conclusion that no home user, nor a small business should ever try it.

 

Why? Because the very nature of these implementations wear out a drive too quickly. And exactly when you need them to do a rebuild, a second failure is more than just a possibility.

 

For the home user and small business, this means a very expensive maintenance process. Replacing two drives a year, on a 7 drive system? I don't know the specific statistics, and the few times I have seen a report, it mentions this issue.

 

I have considered making a UnRaid system, but I am not convinced that design is much better.

 

Before you-all begin casting dispersions on my opinion and experience, consider. I began my career in 1970, and retired in 2007. I covered a lot of ground in that time...

 

One of the recent 'rants' I did was related to defragging. In my office, I never once heard someone defrag a mainframe hard drive, or a server hard drive. Never!

 

I encountered it ONLY on home systems using PATA or IDE hard drives. When SATA came along, with its computer within the hard drive, defrag became a thing of the past. Yet, I see and hear of people doing a daily defrag on SATA drives, and wondering why they are having problems.

 

So I consider the designs for NAS and SAN to be flawed. It does not matter what kind of 'RAID' you chose to use, their design is intended to wear out hard drives quickly. The benefits can be quite real, but for a home user, too expensive.

 

Opinions and facts are greatly appreciated.

 

Best regards, AraiBob

Link to comment
Share on other sites

Link to post
Share on other sites

Hello,

 

I have been 'exploring' the use of a SAN or NAS in my home for a while, and have come to the conclusion that no home user, nor a small business should ever try it.

 

Why? Because the very nature of these implementations wear out a drive too quickly. And exactly when you need them to do a rebuild, a second failure is more than just a possibility.

 

For the home user and small business, this means a very expensive maintenance process. Replacing two drives a year, on a 7 drive system? I don't know the specific statistics, and the few times I have seen a report, it mentions this issue.

 

I have considered making a UnRaid system, but I am not convinced that design is much better.

 

Before you-all begin casting dispersions on my opinion and experience, consider. I began my career in 1970, and retired in 2007. I covered a lot of ground in that time...

 

One of the recent 'rants' I did was related to defragging. In my office, I never once heard someone defrag a mainframe hard drive, or a server hard drive. Never!

 

I encountered it ONLY on home systems using PATA or IDE hard drives. When SATA came along, with its computer within the hard drive, defrag became a thing of the past. Yet, I see and hear of people doing a daily defrag on SATA drives, and wondering why they are having problems.

 

So I consider the designs for NAS and SAN to be flawed. It does not matter what kind of 'RAID' you chose to use, their design is intended to wear out hard drives quickly. The benefits can be quite real, but for a home user, too expensive.

 

Opinions and facts are greatly appreciated.

 

Best regards, AraiBob

You've come to the wrong conclusion. A NAS is the only solution for a Family like mine, and they work great. I have 2, 3 TB drives in raid 1. Sure, it's not the best, but it provides the 7 computers in our house backup and extra storage. We haven't had a drive fail yet. I don't know where you got the notion they fail all the time.

Link to comment
Share on other sites

Link to post
Share on other sites

I have 4 3TB WD Red drives to throw into a server in raid 5, and use as network storage for my house with 5 pc's. Then backing the array upto external archival disks then depositing them in the banks safety deposit box. I really dont care if i go through a drive every 2 years, its for good storage and reliability.

|Casual Rig| CPU: i5-6600k |MoBo: ROG Gene  |GPU: Asus 670 Direct CU2 |RAM: RipJaws 2400MHz 2x8GB DDR4 |Heatsink: H100i |Boot Drive: Samsung Evo SSD 240GB|Chassis:BitFenix Prodigy |Peripherals| Keyboard:DasKeyboard, Cherry MX Blue Switches,|Mouse: Corsair M40

|Server Specs| CPU: i7-3770k [OC'd @ 4.1GHz] |MoBo: Sabertooth Z77 |RAM: Corsair Vengeance 1600MHz 2x8GB |Boot Drive: Samsung 840 SSD 128GB|Storage Drive: 4 WD 3TB Red Drives Raid 5 |Chassis:Corsair 600t 

Link to comment
Share on other sites

Link to post
Share on other sites

Yes, because your personal experiences = what everybody ever should do.

 

We use a 3TB NAS in our house, which is great for quick file getting ect. We use offsite backups for our NAS, since it holds sensitive information. It takes ages to get files from the offsite backup, so it's only practical to use the NAS as our general storage.

I used to be quite active here.

Link to comment
Share on other sites

Link to post
Share on other sites

You've come to the wrong conclusion. A NAS is the only solution for a Family like mine, and they work great. I have 2, 3 TB drives in raid 1. Sure, it's not the best, but it provides the 7 computers in our house backup and extra storage. We haven't had a drive fail yet. I don't know where you got the notion they fail all the time.

 

I am not the only one to point out the hard drive failures with any drive system, but especially the ones using the ZFS systems. look into :

Reducing Single Points of Failure (SPoF) in Redundant Storage

Where the issue of drive failures are noted. I worked 37 years in IT [with many out of the box solutions], and drive issues were 'managed' to help the enterprise. But in the home or small business, drive failures can be fatal. And ZFS compounds those failures because of the way they use and abuse hard drives.

 

I have heard and looked into others, including UnRaid. Using HBA seems to make the situation better, but as long as 'load leveling' [aka defragging] happens the drives get beat to death. We need a better design and hardware to support backups without over working hard drives. I am still using USB JBOD for my backups. No excess writing or work on the drives. I have parts for a full PC, to be put into my Antec 1100 which is idle since I got a better case.

 

One issue I have is that too many systems, like ZFS require all the drives to be the same. In my case, as my drives get 'old', I switch them out, and make them my backup drives, where they should work for years, as they are only used a couple times a month.

 

If you can advise a system I have not evaluated yet, please do so. My systems solutions work because I factored many different issues into a seamless solution. In this arena, I don't [yet] see a good solution.

Link to comment
Share on other sites

Link to post
Share on other sites

All,

 

I should have said that I like a lot of the specs for ZFS. It can handle large data sets, as it was designed to do so. And COW seems like the way to do things. I sounds pretty good. However, as i thought about it, the notion of drives being driven to failure became clear.

 

Unless you can restrict the numbers of copy for the COW process, you could end up with hundreds of copies of a single file, of which only the latest two are necessary.

 

And, in the lectures and articles I read on ZFS, they mention the sad situation in which a second drive fails, during the rebuild (re-silver) process. This is exactly the moment you need to have a system that will NOT fail. So, they have some work to do in this arena.

 

Best regards, AraiBob

Link to comment
Share on other sites

Link to post
Share on other sites

Here's how a server works buy a bunch of drives 4+ than grab one extra for when they die because they will. HDDs die it's just what they do they are mechanical and all your storage needs are being offloaded from all you systems onto them. They are also very likely on ALL of the time where they would more than likely turn off in a standard computer.

If you buy good drives you should have to buy a drive within their warranty. Drive dies slap extra in RMA bad drive. Check refurb drive when it returns to see if it's error-free. If so when the next drive goes slap it in. Once your drives have exceeded their warranty expect them to die and start saving up for a whole new set. Buy the new set once yours are out of warranty and you have no extra's left as they is no point in over paying for a 3+ year old model just to have more go in short order.

Also very likely that by the time you have to replace them it will be time for a storage upgrade.

Link to comment
Share on other sites

Link to post
Share on other sites

I have two NASes at home, one has been running over 4 years and the other over 1 year with the same hard drives without a single drive failure (cheapest consumer drives I could find, not even Enterprise/NAS drives).

 

As for defragging, it's useful if you need to resize a hard drive but other than that I've never had to do it to fix an issue/slowness like usually recommended.

 

Hard drive reliability has come a long way over the years. I work in a data center with tens of thousands of hard drives across thousands of servers and many SANs/NASes and I think we had to replace maybe a dozen hard drives last year across 2 of our data centers (not sure about the rest).

 

I definitely recommend a NAS at home for everybody I know that does more than just gaming on their computers, a NAS is excellent for remote storage, running 24x7 services, and backups. Even if you just buy a cheap $200 system it's worth it.

 

A SAN on the other hand is way overkill for most users. I don't know why anybody would want a SAN for a home environment unless you have a lab or have a lot of people living in your house and you don't want to deal with local storage for all of your devices (and even then you can run an ESXi server and run the VMs off of the NAS for much cheaper).

-KuJoe

Link to comment
Share on other sites

Link to post
Share on other sites

Hmmm,

 

I think one of the 'issues' is one of attitude. Some people 'expect' things will always work. I am of the other school. I know things fail, and I wonder what will fail with it. And how long to recover.

 

I appreciate hearing the success stories. Has there been a satisfaction survey of NAS systems? If so, point me to it.

 

As for RMA, in the Philippines, they give a very short warrantee period, and if you actually try to use it, they simply refuse. Western Digital, as managed by those in Manila, have refused to replace two drives for me. So that is not a viable way for me to go. I need them to work and work and work. I need as conservative a process and set of tools as i can devise or find.

Link to comment
Share on other sites

Link to post
Share on other sites

Hmmm,

 

I think one of the 'issues' is one of attitude. Some people 'expect' things will always work. I am of the other school. I know things fail, and I wonder what will fail with it. And how long to recover.

 

I appreciate hearing the success stories. Has there been a satisfaction survey of NAS systems? If so, point me to it.

 

As for RMA, in the Philippines, they give a very short warrantee period, and if you actually try to use it, they simply refuse. Western Digital, as managed by those in Manila, have refused to replace two drives for me. So that is not a viable way for me to go. I need them to work and work and work. I need as conservative a process and set of tools as i can devise or find.

 

I fully expect things to fail (which is why I have at least 6 backups of my important data). I also don't expect my systems to last for years, but they do and as a result I have had more redundancy than I've needed. If you try to engineer redundancy into a single system then you've already failed, having multiple systems instead (in this case backups) is your best and cheapest solution.

 

I don't know  about any surveys but I personally don't need any, all of the Synology NASes I've ever seen have been 100% reliable and that's good enough for me to trust my data with them.

 

As for RMAs, I've never had an issue with a single RMA since I've started buying my own equipment so I think that might be limited to your region or outside of the US. 

-KuJoe

Link to comment
Share on other sites

Link to post
Share on other sites

That's what RAID6 is for, mate.

I cannot be held responsible for any bad advice given.

I've no idea why the world is afraid of 3D-printed guns when clearly 3D-printed crossbows would be more practical for now.

My rig: The StealthRay. Plans for a newer, better version of its mufflers are already being made.

Link to comment
Share on other sites

Link to post
Share on other sites

Moved to Storage Solutions.

"It pays to keep an open mind, but not so open your brain falls out." - Carl Sagan.

"I can explain it to you, but I can't understand it for you" - Edward I. Koch

Link to comment
Share on other sites

Link to post
Share on other sites

That's okay, OP, have fun with your bare RAID array with no checksums, no zfs filesystem - if dead drives don't kill your data, the bitrot surely will!

In the enterprise, the cost of replacement drives may be totally insignificant compared to the value of the data on the array. ZFS may not be suitable for all users, but if you're set on protecting the integrity of your data, it's the only reliable option. RAID by itself just doesn't have enough error correction protocols built in.

Link to comment
Share on other sites

Link to post
Share on other sites

Hello,

 

I have been 'exploring' the use of a SAN or NAS in my home for a while, and have come to the conclusion that no home user, nor a small business should ever try it.

 

Why? Because the very nature of these implementations wear out a drive too quickly. And exactly when you need them to do a rebuild, a second failure is more than just a possibility.

 

For the home user and small business, this means a very expensive maintenance process. Replacing two drives a year, on a 7 drive system? I don't know the specific statistics, and the few times I have seen a report, it mentions this issue.

 

I have considered making a UnRaid system, but I am not convinced that design is much better.

 

Before you-all begin casting dispersions on my opinion and experience, consider. I began my career in 1970, and retired in 2007. I covered a lot of ground in that time...

 

One of the recent 'rants' I did was related to defragging. In my office, I never once heard someone defrag a mainframe hard drive, or a server hard drive. Never!

 

I encountered it ONLY on home systems using PATA or IDE hard drives. When SATA came along, with its computer within the hard drive, defrag became a thing of the past. Yet, I see and hear of people doing a daily defrag on SATA drives, and wondering why they are having problems.

 

So I consider the designs for NAS and SAN to be flawed. It does not matter what kind of 'RAID' you chose to use, their design is intended to wear out hard drives quickly. The benefits can be quite real, but for a home user, too expensive.

 

Opinions and facts are greatly appreciated.

 

Best regards, AraiBob

 

WD Red disks will never fail at a rate of 2 disks a year in such low drive count configurations. These disks are designed specifically for this use, and this is not guess work either I know this from using such disks at home and equivalent at work or better and many many others too. There are also many cloud backup storage bloggers who give exact numbers on drive count failures and the disks they have tried.

 

SATA disk do require defragmentation, this is a file system thing not a disk connection type. Nothing changed between PATA and SATA other than a parallel connection interface type to serial. The actual command signals used are the same. The same also applies for SCSI and SAS. The reason why some server administrators didn't defragment the file systems on file servers is the usage pattern is always random I/O so there is a reduced benefit of doing such a task.

 

RAID, ZFS or any other type of disk redundancy system does not put more stress on a single drive in the system than normal, actually less. All these problems that used to be true were fixed before the turn of the century, no body in their right mind would use RAID 4 and I would challenge anyone to find a new RAID card that will even let you. 

 

The reason why ZFS or RAID will not stress a drive is if we even ignore the fact they distribute the load across disks they both can implement write-back cache, with RAID cards also being able to add a battery to it preventing data loss in a power cut.

 

There are many ways to design a ZFS system to take in to account hardware failure, multiple HBA cards for example and is really not that expensive. No matter how simple or complex your storage design nothing is perfect and backups are always required.

 

A home NAS doesn't have to be fancy at all. It can be as simple a RAID 1 and 2 disks. This is by far better than a single disk and should be obvious as to why.

 

Sorry but coming from a strong enterprise storage background I disagree with most of your assessments. I do not doubt that you have had to deal with many failed systems and bad designs but from the advent RAID 5 and write-back cache disks can be in an array for many years without failure and things have only gotten better with RAID 6 and ZFS.

 

As for the warranty of the disk they must honor this by law unless you void it. Using non NAS/server rated disks in such a setup can be an example of this and is something that may be mentioned in the fine print of the disk which doing so voids it. If you have problems getting replacements during the warranty period find a different supplier or contact western digital directly or try @Captain_WD here on the forum.

Link to comment
Share on other sites

Link to post
Share on other sites

WD Red disks will never fail at a rate of 2 disks a year in such low drive count configurations. These disks are designed specifically for this use, and this is not guess work either I know this from using such disks at home and equivalent at work or better and many many others too. There are also many cloud backup storage bloggers who give exact numbers on drive count failures and the disks they have tried.

 

SATA disk do require defragmentation, this is a file system thing not a disk connection type. Nothing changed between PATA and SATA other than a parallel connection interface type to serial. The actual command signals used are the same. The same also applies for SCSI and SAS. The reason why some server administrators didn't defragment the file systems on file servers is the usage pattern is always random I/O so there is a reduced benefit of doing such a task.

 

RAID, ZFS or any other type of disk redundancy system does not put more stress on a single drive in the system than normal, actually less. All these problems that used to be true were fixed before the turn of the century, no body in their right mind would use RAID 4 and I would challenge anyone to find a new RAID card that will even let you. 

 

The reason why ZFS or RAID will not stress a drive is if we even ignore the fact they distribute the load across disks they both can implement write-back cache, with RAID cards also being able to add a battery to it preventing data loss in a power cut.

 

There are many ways to design a ZFS system to take in to account hardware failure, multiple HBA cards for example and is really not that expensive. No matter how simple or complex your storage design nothing is perfect and backups are always required.

 

A home NAS doesn't have to be fancy at all. It can be as simple a RAID 1 and 2 disks. This is by far better than a single disk and should be obvious as to why.

 

Sorry but coming from a strong enterprise storage background I disagree with most of your assessments. I do not doubt that you have had to deal with many failed systems and bad designs but from the advent RAID 5 and write-back cache disks can be in an array for many years without failure and things have only gotten better with RAID 6 and ZFS.

 

As for the warranty of the disk they must honor this by law unless you void it. Using non NAS/server rated disks in such a setup can be an example of this and is something that may be mentioned in the fine print of the disk which doing so voids it. If you have problems getting replacements during the warranty period find a different supplier or contact western digital directly or try @Captain_WD here on the forum.

LeadEater,

 

Thank you for your considered reply. I have encountered 'resistance' to the notion of no defrag before. Perhaps the solution is to better manage the files, so they cannot be fragmented so much. Yes, you cannot 'fix' this issue on O S files, and for that I put the O S onto an SSD (Which you NEVER defrag). Problem solved. Yes, defrag can be a useful tool, but I feel it is used too much, recklessly.

 

In my case, I have a work drive (A velociraptor) onto which downloads go, and video conversions go. When those tasks are done, I move them to my 'storage' drives which means very little defrag. e.g. I manage the files to reduce defrag issues. My O S is on an SSD, so no defrag there either.

 

My issue with ZFS is I have seen video and seen reports on the web of failure exactly at the moment they are attempting to 'resilver' a replacement drive. That moment of crisis is NOT being managed well.

 

Perhaps the issue is really two issues. Online file usage vs backups. I am really interested in a super reliable backup, that I might put online for the family to watch movies, listen to music, etc. In that case, there will be few writes, and a lot of reads. My main pc will deal with conversions and downloads, etc. This backup system would hold relatively whole files (almost no defrag) and I would never do a defrag to ensure long life.

 

As a retired person, I do watch my pennies. And I buy parts that should last a long time. My power supplies has a 7 year guarantee, for example. I buy cases that work the way I want to work, which means they cost a lot. However, because I do 3 or 4 builds within that case, the actual price per build is small. Each build is for a specific purpose.

 

For example, on my main pc, the last build was to allow dvd conversions to be done quickly, as they have blackouts here, all too often. Now I can do an entire dvd in less than 40 minutes. 20 minutes to load the dvd to a drive, 20 minutes to do the conversion into a mkv file. Yes, I do have a UPS, but even this 20 minutes of high cpu usage would be too much.

 

So, you might say I have a peculiar and specific view of systems and how I expect them to work. If they fail not at their normal work, but at the moment of crisis, then I stay away from that, or find a way to mitigate that issue. Something about the way the resilvering is being done in the ZFS system can kill drives. And does, far too often. They need to work on that issue, but I have not heard of such an effort. I am sure they are working on it, but I have not seen an update on this critical issue.

 

And I am used to being the one 'far out' on the edge of requiring things to work. I did some things on the job that were supposed to be impossible, simply by paying attention to what works vs what is problematic. Far enough 'out of the box' that one manager called me 'out of this planet'. Not criticizing, but noting my requirements and solutions were great, but so far from what others thought was possible. So I am used to criticism, but I persist because the end results are worth it.

 

So, please continue with your comments and suggestions.

Edited by Godlygamer23
Removing sign-off.
Link to comment
Share on other sites

Link to post
Share on other sites

-snip-

 

Well the same old rule applies, keep it simple. If you don't need it then don't use it. Best advice anyone can give, I also don't like recommending to people to spend any money they don't actually need to. Also nice another velociraptor user :) . Moved my 4 to a secondary system now that I almost exclusively use SSD.

 

I wasn't in the never defrag club for a file server, just went against all real logic of how the file systems worked and manage not just used space but free space. I've also seen some systems grind slowly to their performance death due to it and when that particular place tried to fix it the task was impossible. ETA was literally months and all the free space was so fragmented it may have never been possible without increasing the array. Couldn't help but laugh and say told you so :P Since you aren't deleting files often from your storage drive I suspect then yep running defrag weekly is pointless, nothing could possibly have gotten fragmented.

 

The problem of disks failing during a rebuild or resilver is an interesting one and makes complete sense. Usually all the disks are purchased at the same time so if one fails due to normal wear then it should be expected that those other disks will too. Cascading failure is every IT workers worst nightmare and I've seen it happen to a pool of around 150 disks, all data gone. One way to mitigate the issue is to insist that a reasonable amount of disks at the time of purchase is from a different manufacturing batch, then later in its life add more new disks to the system. In a RAID 5/single parity system have a hot spare but disable auto rebuild and in a RAID 6/dual parity system have 1 or 2 hot spares but if possible set the system to abort any rebuild tasks if another disk fails during a rebuild. Reason for this is so you can insure you have a good full backup of the system before doing the rebuild that could result in complete array failure.

 

The only way I could see improving what you already do, which is working, would be to mirror all your disks which doubles the storage costs. Simply having backups would be cheaper.

Link to comment
Share on other sites

Link to post
Share on other sites

Well the same old rule applies, keep it simple. If you don't need it then don't use it. Best advice anyone can give, I also don't like recommending to people to spend any money they don't actually need to. Also nice another velociraptor user :) . Moved my 4 to a secondary system now that I almost exclusively use SSD.

 

I wasn't in the never defrag club for a file server, just went against all real logic of how the file systems worked and manage not just used space but free space. I've also seen some systems grind slowly to their performance death due to it and when that particular place tried to fix it the task was impossible. ETA was literally months and all the free space was so fragmented it may have never been possible without increasing the array. Couldn't help but laugh and say told you so :P Since you aren't deleting files often from your storage drive I suspect then yep running defrag weekly is pointless, nothing could possibly have gotten fragmented.

 

The problem of disks failing during a rebuild or resilver is an interesting one and makes complete sense. Usually all the disks are purchased at the same time so if one fails due to normal wear then it should be expected that those other disks will too. Cascading failure is every IT workers worst nightmare and I've seen it happen to a pool of around 150 disks, all data gone. One way to mitigate the issue is to insist that a reasonable amount of disks at the time of purchase is from a different manufacturing batch, then later in its life add more new disks to the system. In a RAID 5/single parity system have a hot spare but disable auto rebuild and in a RAID 6/dual parity system have 1 or 2 hot spares but if possible set the system to abort any rebuild tasks if another disk fails during a rebuild. Reason for this is so you can insure you have a good full backup of the system before doing the rebuild that could result in complete array failure.

 

The only way I could see improving what you already do, which is working, would be to mirror all your disks which doubles the storage costs. Simply having backups would be cheaper.

Yes, I agree with KISS. File management is a useful skill. But it takes time to develop.

 

I forgot to mention I have one other issue with ZFS, It requires all the disks to be the same size. This prevents a gradual enlargement of the 'pool'. You might start with some 2TB drives, and after a while decide you need larger drives. One by one you 'upgrade' each drive from 2TB to 3TB. and use the resilver process to fill in the new drive. Then later replace one by one with 4TB drives. This gradual process means less chance of a failure (ignoring the resilvering issue for the moment).

 

I have plenty of drives, of varying sizes as I upgraded my main pc. It now contains 1 ssd (OS), one 1TB Velociraptor, and four 4TB data storage drives.

 

I have an Antec 1100 case, sitting unused at the moment. I also have over a dozen SATA drives of varying size, used as my USB backups. I am considering putting what will fit into the Antec 1100 case, with a motherboard and an older Intel I7 cpu, and plenty of memory on hand. With as many of the hard drives as will fit (and can connect to the mb), and a couple of SSDs for the OS and 'ZIL' drives. The SSDs would be a new purchase. However, I have two 300GB velociraptors sitting around, that could work. As for the connection of the drives to the MB, I have considered a couple of HBA controller cards, as they seem to be reliable, too.

 

But I am held up by the two issues I have described. I don't know enough about this process to experiment. I would prefer to have it all worked out first. Simplest would be to put a Ubuntu Linux server OS on the drive, specifying the use of ZFS as the file system. But I don't know all the ramifications of this setup. Does it care about varied disk sizes? Will the HBA also care about the sizes of the disks?

 

Perhaps another 'way' can be found, via advice from someone like yourself.

Link to comment
Share on other sites

Link to post
Share on other sites

Sorry, I just had another 'thought'...

 

Perhaps I am overthinking this issue for myself. Perhaps all I actually need, today is a server containing the files I want to back up and to share. No NAS or ZFS, just an os designed to run and run without errors, and ready to share.

 

FreeNAS might be the way to go. or simply Ubuntu Linux Server...

Link to comment
Share on other sites

Link to post
Share on other sites

Sorry, I just had another 'thought'...

 

Perhaps I am overthinking this issue for myself. Perhaps all I actually need, today is a server containing the files I want to back up and to share. No NAS or ZFS, just an os designed to run and run without errors, and ready to share.

 

FreeNAS might be the way to go. or simply Ubuntu Linux Server...

 

Best advice I can give is to simply try it out before putting important data on it, if you already have the required hardware or enough for decent testing. Simulate failures, adding more disks, wiping the OS and importing the disk pool etc. FreeBSD, Ubuntu etc will work fine with pure ZFS and samba etc, FreeNAS just makes the whole process simpler and an easy to use interface which is actually extremely valuable as command line and config file gets tiresome especially when your in a lazy mood and just want stuff to work. 

 

Just make sure the disks you intend for real use have TLER/NAS rated, this is important for both RAID and ZFS. If a disk without TLER encounters a bad sector it will continuously try and re-read the sector and all I/O will freeze, the disk may also get flagged as failed too when it actually isn't. A disk with TLER generally isn't good to use as a standalone disk as after the 7 second timeout it will stop trying the bad sector and in this situation that is actual data loss.

 

Try before you buy basically :) Experience is often the only way to gain true understanding of something. Also I am no FreeNAS expert but there are plenty on this forum, I'm a Windows person myself and only use unix type systems when I have to as part of work or that type of OS is clearly better for the intended purpose. 

Link to comment
Share on other sites

Link to post
Share on other sites

I'm failing to understand what you're saying here: QNAP and Synology + dozens of other brands are NAS units designed for SOHO, SMB and home use.

 

SAN's on the other hand - generally not a home user tool.

Link to comment
Share on other sites

Link to post
Share on other sites

~snip~

 

Hey there AraiBob, Happy New Year!
 
I'm sorry to hear about the bad opinion you have on NAS devices. As a Western Digital representative I can give you some info on this:
 
- Many people prefer NAS devices for their home usage, backups, file sharing, media streaming and other purposes. Simple devices like the WD My Cloud products (as well as other brands' equivalents) are pretty popular die to their ease of management and setting up, quite inexpensive for what they offer and simplicity of using them. 
 
- On the reliability issue, I always advice people to use NAS/RAID-class drives for such types of usage as these drives have additional features that handle these kinds of workloads and work environments much better and have significantly lower chance of corrupting the data or dropping out of the RAID. WD Red is a great example for this as many people use them in their home RAID, NAS and even server systems and are quite happy about them. 
 
- Regarding the RMA process, it really depends on the specific case, but if you have followed the proper way of handling the drive, using it and maintaining it there should be no problem of getting a replacement in case your drive fails the WD DLG tests and is still under warranty. Feel free to post here details or PM me if I can do anything to assist. 
 
- Wearing on the drives depends solely on the type of usage, environment and the amount of data written and read from these drives. RAID does put more pressure on drives but this is why NAS/RAID-class drives are designed for. With proper maintenance these drives can easily reach and surpass their warranty length. :)
 
Feel free to ask if you happen to have any questions that I might be able to help with!
 
Captain_WD.

If this helped you, like and choose it as best answer - you might help someone else with the same issue. ^_^
WDC Representative, http://www.wdc.com/ 

Link to comment
Share on other sites

Link to post
Share on other sites

Best advice I can give is to simply try it out before putting important data on it, if you already have the required hardware or enough for decent testing. Simulate failures, adding more disks, wiping the OS and importing the disk pool etc. FreeBSD, Ubuntu etc will work fine with pure ZFS and samba etc, FreeNAS just makes the whole process simpler and an easy to use interface which is actually extremely valuable as command line and config file gets tiresome especially when your in a lazy mood and just want stuff to work. 

 

Just make sure the disks you intend for real use have TLER/NAS rated, this is important for both RAID and ZFS. If a disk without TLER encounters a bad sector it will continuously try and re-read the sector and all I/O will freeze, the disk may also get flagged as failed too when it actually isn't. A disk with TLER generally isn't good to use as a standalone disk as after the 7 second timeout it will stop trying the bad sector and in this situation that is actual data loss.

 

Try before you buy basically :) Experience is often the only way to gain true understanding of something. Also I am no FreeNAS expert but there are plenty on this forum, I'm a Windows person myself and only use unix type systems when I have to as part of work or that type of OS is clearly better for the intended purpose. 

LeadEater,

 

I just watched the video from Linus of the failure of his all SSD backup server. What failed? the Raid card... My personal opinion is that when they work things are great, but when they fail, total disaster is likely (not possible). So I will stay away from hardware raid, and I wonder about software raid. And I consider the demand that all drives in the 'array' be the same size another ridiculous demand.

There are serious design flaws in the NAS arena in that at the time of a failure, is when they need to be the most helpful. Instead, they add to the problems. I am happy to avoid such a mess.

Link to comment
Share on other sites

Link to post
Share on other sites

 

Hey there AraiBob, Happy New Year!
 
I'm sorry to hear about the bad opinion you have on NAS devices. As a Western Digital representative I can give you some info on this:
 
- Many people prefer NAS devices for their home usage, backups, file sharing, media streaming and other purposes. Simple devices like the WD My Cloud products (as well as other brands' equivalents) are pretty popular die to their ease of management and setting up, quite inexpensive for what they offer and simplicity of using them. 
 
- On the reliability issue, I always advice people to use NAS/RAID-class drives for such types of usage as these drives have additional features that handle these kinds of workloads and work environments much better and have significantly lower chance of corrupting the data or dropping out of the RAID. WD Red is a great example for this as many people use them in their home RAID, NAS and even server systems and are quite happy about them. 
 
- Regarding the RMA process, it really depends on the specific case, but if you have followed the proper way of handling the drive, using it and maintaining it there should be no problem of getting a replacement in case your drive fails the WD DLG tests and is still under warranty. Feel free to post here details or PM me if I can do anything to assist. 
 
- Wearing on the drives depends solely on the type of usage, environment and the amount of data written and read from these drives. RAID does put more pressure on drives but this is why NAS/RAID-class drives are designed for. With proper maintenance these drives can easily reach and surpass their warranty length. :)
 
Feel free to ask if you happen to have any questions that I might be able to help with!
 
Captain_WD.

 

Captain_WD,

 

You have given the 'standard' responses to the issues I raised. I am NOT saying the problem is the drives. I AM saying the issue is the systems using the drives are poorly designed. In particular they seem to be expressly designed to cause drives to fail, at exactly the moment they are needed to work.

Example, In the videos and reports I have looked at for ZFS, it has been noted that when one of the drives has failed, and they put a replacement drive in its place, and begin the 'resilvering' process, another drive will fail and that is the end of the data in that specific array.

This tells me the reslivering (or building process for the new drive) process has some serious issues. At the exact moment when their work must be the most reliable, somehow they cause another drive to fail. And the drives are blamed, not the software using the drives. WD and other drive manufacturers should be pushing back on this particular issue. Hard drives have come a long ways, both in size, density, and reliability. But bad software can kill good hardware, and make it look easy.

 

This was one of the reasons why I made the topic title I chose. An enterprise can 'almost' afford such failures. but a home or small business? No.

As a retired computer person, who has had many 'jobs' over the 37 years of my work life, I have seen this happen over and over again. The 'genius' guys doing the coding are very clever, too clever. If they understood the hardware side more, I believe they could code better solutions. I would prefer they stop showing off how clever they are, and show us how smart they are. I hate clever, I appreciate smart.

Best regards

Link to comment
Share on other sites

Link to post
Share on other sites

LeadEater,

 

I just watched the video from Linus of the failure of his all SSD backup server. What failed? the Raid card... My personal opinion is that when they work things are great, but when they fail, total disaster is likely (not possible). So I will stay away from hardware raid, and I wonder about software raid. And I consider the demand that all drives in the 'array' be the same size another ridiculous demand.

There are serious design flaws in the NAS arena in that at the time of a failure, is when they need to be the most helpful. Instead, they add to the problems. I am happy to avoid such a mess.

 

While the primary failure was the RAID card, the more critical failure was the lack of backups. He admits he only had one copy of his critical data and did not have any backups (even though he was in the process of backing it up, that doesn't help him). The more important your data is, the more backups you should have of it. If you're data isn't critical (or it's something you can re-download off the internet again) then having 0-1 backups is fine. If your data is mission critical (like source files of an unfinished project) then multiple backups is essential.

 

When I recommend a NAS to people I do so expecting they will treat it as a backup of their data that they keep on their PC so if the NAS dies, you still have the original copy and if your PC dies you still have the copy on your NAS. At the very least you should also have at least one off-site backup which can be had for extremely cheap depending on the amount of data (a few TB can run you about $20/month if you want your own server or you can get something like Carbonite or CrashPlan if you don't want to run your own infrastructure).

 

Hardware will fail, it's inevitable. Relying on a single device (be it NAS, SAN, USB Drive, storage server, or "the cloud") is a gamble no matter what but you can reduce your risk by increasing the number of devices you gamble with.

 

My backup scheme involved 1 PC, 2 VPSes, 2 NASes, 1 "Cloud", 1 external hard drive, and (for really important things like taxes, licenses, and family pictures/movies, etc...) I'll throw them on either a 2nd external drive or DVDs once every few months just so I have something that doesn't get updated automatically in the rare instance where my versioned backups get corrupted or are unavailable. This means that even if my internet gets cut, I still have access to 4-5 copies of my backups and with internet access I can grab 7-8 copies with different versions of my data going back over 2 years.

 

Now of course my setup is overkill for most people, but for an investment of well under $500 a person can easily have a NAS and external USB for local backups and for less than $10/month you can even have a decent off-site solution (or you can mail a DVD or external drive to a family member once every month or two for an even cheaper solution).

 

Lastly, if you don't trust the hardware or the code, you can easily roll your own solution for cheap and then you won't have to rely on the hundreds or thousands of hours engineers have put in to design a product people trust their businesses with.

-KuJoe

Link to comment
Share on other sites

Link to post
Share on other sites

LeadEater,

 

I just watched the video from Linus of the failure of his all SSD backup server. What failed? the Raid card... My personal opinion is that when they work things are great, but when they fail, total disaster is likely (not possible). So I will stay away from hardware raid, and I wonder about software raid. And I consider the demand that all drives in the 'array' be the same size another ridiculous demand.

There are serious design flaws in the NAS arena in that at the time of a failure, is when they need to be the most helpful. Instead, they add to the problems. I am happy to avoid such a mess.

 

From what I saw the root of the issue was the failing motherboard, that caused one of the RAID cards to corrupt itself. Personally I do sort of find that slightly dubious as normally RAID cards and their firmware are more resilient than that so it could have simply been dual hardware faults unrelated to each other, gone unnoticed till total failure. Also SSD arrays are very demanding on the hardware so would likely cause failure to hardware that is already predisposed to it from a manufacturing fault or physical damage, like during shipping.

 

Normally under a RAID card failure or motherboard the part only needs replacing, and in the case of a RAID card it doesn't have to be the same exact model (should be) just a similar card. Once the RAID card is replaced just import the array and your up and running. For the motherboard fault just replace it, the RAID arrays should be undamaged. Silent faults that cause corruption rather than failing outright are the worst kind and only backups can save you with this type of fault.

 

Personally I believe there may have been a chance of fixing the system without the data recovery service, but for critical business data getting that type of service involved early is the correct thing to do. Once it was discovered that the motherboard was faulty moving the OS disk and the 2 working RAID cards plus a 3rd replacement to a new system may have worked. The importing of the array to the replacement card that was failing could have been due to the faulty motherboard. If the offline RAID 5 array could have been recovered the software RAID 0 would have come back online and all data with it.

 

The decision to use what was essentially RAID 50 but with both hardware and software RAID both brought with it increased failure risk, software RAID 0, and added complexity. RAID 50 is not unsafe to use but because of the overlaid RAID 0 must only ever be used with backups and careful monitoring of system health.

 

The requirement of disks being the same size in RAID and ZFS, other systems too, is more of a best practice than anything else. Typically in the case of differing disk sizes the smallest disk is the usable size for all other disks in the system. There are two ways to expand a disk system, one is to add more disks and the other is to gradually replace the existing disks with larger ones till all have been done then expand.

 

There is one software storage solution that does not require disks to be of the same size that I have used and that is Windows Storage Spaces. This is what Linus should have used for his SSD array since his requirement was a Windows OS. A more resilient design could have been made using it and would have performed faster with the added benefit of keeping TRIM support for the SSD's reducing wear and performance loss.

 

Also going back to looking at off the shelf NAS solutions, my previous job we had around 1200 schools all backing up to these types of systems. They were mostly QNAP and Netgear 4 bay to 12 bay rackmount units using backup software such as: Windows backup, Veeam, Symantec etc. There were very few failures of the NAS systems that caused data loss, which is only backup data, and that was only one I know of which was the NAS itself. That failed NAS wasn't attempted to be fixed as it wasn't needed, if the disks were placed in a new unit all that data would have bee there. Of the two brands QNAP is better, personal opinion but I had less issues with them and the NAS software was better.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×