Jump to content

A cautionary tale about how a person can become a SPoF.

Hi guys,

Recently, bad things have been happening to some servers I help manage. Specifically, each of the servers, which had a RAID 5 array in them, had a faulty HDD (and things went sour from there). 

Well, these 3 servers were managed by both me and one other person. I was learning how to manage them after he had built each of them over time. Of the people who used them, only he and I were able to actually fix them in terms of hardware or coordination (i.e. how one interacted with another) related issues because we knew how they were set up. 

Unfortunately, a second HDD developed faults on the primary of those 3 servers. When we went to replace one of the faulty drives in it (which we should have much earlier anyway), for whatever reason, the RAID array broke. It wouldn't replicate, and bad times were had. 

However, in all this, I believe the problem, in terms of management, of these servers was simply that there was only 1 person who actively coordinated and fixed them (until I came along and learned some). 

While he was keeping them all running, he was trying out new features that he hadn't used before on Windows 2k8. Such as DFS Replication, Remote Profiles, among other things. Although he never got them fully working. I personally believe DFS Replication wouldn't work because the drives were failing redundancy (due to the faulty drives in each server). However, I can't be sure.

Well, due to all these features being slightly implemented, but not really, backups were not being done correctly between the 3 servers. They weren't replicating with one another. 

So, when the primary one failed, the other two only had outdated files on them. Which isn't a good thing, obviously. We had 4 different kinds of backups set up, but none worked because one would fail while the other didn't, so we had to keep moving files around to get things working. 

Fortunately, our "last resort" backup, which was nothing but a batch file, and a network connection to a JBOD server, actually had a fairly recent backup of our primary server, so we weren't completely boned. 

Just remember, for those of you out there who do manage multiple servers for whatever reason, never do things half-way, and make sure there are enough people to actively manage the servers and keep up maintenance on them, otherwise it won't matter how much redundancy you have. It will all fail inevitably if no one maintains it. 

I know. It should be common sense. But hey, sometimes when you are building big things, the tiny details get lost along the way. 

The more you know,
Vitalius

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Yikes, that'd got to be a stressful situation; Good thing you found a fairly recent, working backup.

 

These kinds of things are the main reason why I almost always opt to not host any vital data myself, but rather let Amazon deal with the redundancy, 'cause I frankly trust their staff more than my self.  :)

Cheers,

Linus

Link to comment
Share on other sites

Link to post
Share on other sites

Yikes, that'd got to be a stressful situation; Good thing you found a fairly recent, working backup.

 

These kinds of things are the main reason why I almost always opt to not host any vital data myself, but rather let Amazon deal with the redundancy, 'cause I frankly trust their staff more than my self.  :)

Well, this data is in excess of 7TB (roughly). I can't imagine how much that would cost? (It would cost roughly $575 a month)

Plus we use and access it daily with roughly 50 users. (Not sure how many requests that would be, but I'm assuming another $500, so around $1,000 per month)

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Time to switch to a RAID 6 or RAID 10? Glad you got things going again.

 

Is an enterprise storage solution an option? The server could connect over SCSI  or FC to products from  HP, EMC, Dell, etc. and all the storage would be handled safely on the other end.

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

Well it really boils down to having a tested DR (Disaster Recovery) Plan. There's different types of data, the un-reproducible type (it is unique and can only be backed up and can't be recreated from other data), data dependent on the previous but that required(s) time and of course money to reproduce, then data others have or is not important and can be lost but would be nice to have just in case. We'll call them Teirs, Teir 1 being the most important, and so on, you can have as many tiers as you want. Then the fileserving comes in, where do you store Teir 1 vs Teir 2, etc. and then backup Teir 1 so in case the Teir 1 fileserver goes belly up for any reason (not that we want it to or will leave it vulnerable) I usually think in super extreme conditions like a monster came by and ate it, something you can't plan for nor do anything about. How do you get your data back, assuming you could still by an identical server to that holds the data you had on it? Sure all the fancy RAID and tech helps you out but not having an offsite backup or even a different media backup is basically not a good DR plan. Think 9/11, where did all the data on those servers in the basement(s) go? Was it lost? Nope, they had an offsite backup as you should if the data is that important.

 

Then, you have to test the DR plan, testing can also be expensive as you also need to keep your current servers running for their intended purposes sometimes you can't down them just to simulate a failure, buying redundant hardware helps test. Not saying you need to buy a fully functional and equal capacity fileserver but something you can simulate the DR plan on. VM's help sometimes, but nothing like the actual hardware so who ever is simulating it and wasn't there on the day of the install can play with the buttons, so to speak.

 

Tape backup is a cheap way to backup things up, sure its not "cool" but damn it works, with the right software that can handle your data size.

I roll with sigs off so I have no idea what you're advertising.

 

This is NOT the signature you are looking for.

Link to comment
Share on other sites

Link to post
Share on other sites

Time to switch to a RAID 6 or RAID 10? Glad you got things going again.

 

Is an enterprise storage solution an option? The server could connect over SCSI  or FC to products from  HP, EMC, Dell, etc. and all the storage would be handled safely on the other end.

Actually, we didn't. :( Our last resort backup only got 50% of it. The thing got full due to emails half way through. It's complicated to explain why. 

We have a total of 8 drives in this server. Disk 0-7, where 3 is the boot drive and not part of the RAID 5 array. Disks 0-2 & 4-7 are part of the RAID 5 array. 

Disks 4-7 are hooked up to a RAID card that isn't part of the RAID setup. It's being used for more SATA ports. That's it. In the BIOS of the card, no RAID is set up. Disks 0-2 are connected to the motherboard. Unfortunately this means programs like Crystal Disk Info can't see the status of the RAID card connected drives (they appear as ??? or Grey)

The RAID 5 array was made in Disk Management on Windows 2k8 server. So it's Software RAID. Disk 7 failed. In Disk Management, it says "Errors" where it would normally say "online" or "offline" on that specific drive (Disk 7). The entire array won't work. It fails when we try to reactivate the volume. When it reboots, it says "Failed Redundncy (At Risk)". Which implies it should work without the drive. At risk means it isn't gone. However, if we mess with it, such as reactivating the volume (i.e. making the RAID run), it says "Failed".

Why? RAID 5 is supposed to be single drive fault tolerant. Is 7 drives too much for RAID 5? Not enough parity space or wtv? They are all 1 TB drives, all from WD. Most are the same model number. We tried doing a bit for bit clone of Disk 7 to a new drive, but Clonezilla reported soooo many errors. And it kept repeating from 0-100%. It was very weird. We tried plugging that into the same port as the broken Disk 7 drive as a replacement, but it wasn't registered as part of the RAID. Just an unallocated disk.

It will not let us remove the bad HDD to replace it. And it won't activate (which I assume it needs to be working to be able to remove the bad HDD).

FYI: This is the RAID Card. It's old, but it works.

We would, but we are a small company of 50 people who handle less than 1GB of data a day (not counting emails). The only thing that would warrant such a thing is the importance of that data (90% of it is important). 

Honestly, if things had been set up 100% right, we would be far more secure than we'd ever need to be. It's from everything being half-done that these problems have arisen. So I think Enterprise level solutions would be a bit overdoing it for us. 

We are now planning separate servers for each major drive (the ones that take on most of the load/data/importance) set up with either simple RAID 1 of 4TB drives, or RAID 10 of 4TB drives. Deciding on whether we want to balance performance with price (however, I advocate that for our ghetto last resort backups, speed is just as important as redundancy because they are essentially one in the same thing in that case). 

I'm honestly at a loss. We can get it to resynch, but then it just sits there. And we've had it sit there (back when this happened before in an older server) for days, with nothing really happening. And we don't want that to happen again. 

What gets me I guess is that only 1 drive failed. None of the ones I can see have any issues (aside from Disk 2 having a high number of reallocated sectors, but otherwise it's fine). And I can't see Disk 4-7. 

What would you suggest we do? This is a very confusing situation because things aren't working how they should.

Edit: Oh, and when we try to reactivate the RAID, it says it is in RAW format and needs to be formatted. I assume that means if we did that, we lose all the data (as that's how normal formatting works). Don't understand why that is happening either.

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

I have a question specifically for you, if you don't mind, regarding RAID 5 and Windows 2k8 Server software RAID....

I don't know much about Windows 2k8 as far as software RAID goes, I mostly use it for VMs to generate I/O traffic to systems I'm testing. However, have you looked at these pages?

 

http://technet.microsoft.com/en-us/library/cc771775.aspx#BKMK_5

http://technet.microsoft.com/en-us/library/cc785259(v=ws.10).aspx

 

From what I can read, Disk Management doesn't actually provide a RAID redundancy, it just spreads the volumes out over the disks in a way that reduces the likelihood of total data loss. Correct me if I'm wrong, though.

 

If that is the case, then you definitely don't want to remove the disk without being absolutely sure it is dead. Otherwise all volumes with data on that disk will be corrupted. If it is dead, then your only hope is a backup of the data. 

 

But please don't take my word for it, since I don't work with this service. Microsoft support would be the most knowledgeable about the next course of action.

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

As for your suggestion about Enterprise level storage solutions...

 

We would, but we are a small company of 50 people who handle less than 1GB of data a day (not counting emails). The only thing that would warrant such a thing is the importance of that data (90% of it is important). 

Honestly, if things had been set up 100% right, we would be far more secure than we'd ever need to be. It's from everything being half-done that these problems have arisen. So I think Enterprise level solutions would be a bit overdoing it for us. 

We are now planning separate servers for each major drive (the ones that take on most of the load/data/importance) set up with either simple RAID 1 of 4TB drives, or RAID 10 of 4TB drives. Deciding on whether we want to balance performance with price (however, I advocate that for our ghetto last resort backups, speed is just as important as redundancy because they are essentially one in the same thing in that case). 

I'm honestly at a loss. We can get it to resynch, but then it just sits there. And we've had it sit there (back when this happened before in an older server) for days, with nothing really happening. And we don't want that to happen again. 

What gets me I guess is that only 1 drive failed. None of the ones I can see have any issues (aside from Disk 2 having a high number of reallocated sectors, but otherwise it's fine). And I can't see Disk 4-7. 

What would you suggest we do? This is a very confusing situation because things aren't working how they should.

 The cheapest would be to go with the same setup and just make sure that backups are scheduled appropriately. This is an opportunity for you to show your skills.

 

You could buy/build a ZFS box that serves stuff up over CIFS (Samba) shares, make sure that there is no single point of failure in the machine, and build another one for replication, if ZFS supports it (I think it does). If that is the case, you could set up a dedicated connection (10 gigabit probably recommended) between the two for replication purposes, and have separate NICs for connecting to client interfaces, plus one for management. This would probably be a cheap option and ZFS comes with all sorts of data protection benefits, but the downside is you have to go through some sort of learning curve.

 

I think you're right that an enterprise solution is a bad idea in the short term. However as the company grows (and the amount of data grows), that might change. I can testify to the ease-of-use, reliability and scalability of EqualLogic products, but that's about the extent of it. I can ask around at work for pricing information about our arrays (it's not public information) if you'd like.

 

EDIT: Found this, it's a few years out of date, but its probably a ballpark estimate. Ignore the super-expensive ones, those are usually high-capacity or flash-based arrays.

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

I don't know much about Windows 2k8 as far as software RAID goes, I mostly use it for VMs to generate I/O traffic to systems I'm testing. However, have you looked at these pages?

 

http://technet.microsoft.com/en-us/library/cc771775.aspx#BKMK_5

http://technet.microsoft.com/en-us/library/cc785259(v=ws.10).aspx

 

From what I can read, Disk Management doesn't actually provide a RAID redundancy, it just spreads the volumes out over the disks in a way that reduces the likelihood of total data loss. Correct me if I'm wrong, though.

 

If that is the case, then you definitely don't want to remove the disk without being absolutely sure it is dead. Otherwise all volumes with data on that disk will be corrupted. If it is dead, then your only hope is a backup of the data. 

 

But please don't take my word for it, since I don't work with this service. Microsoft support would be the most knowledgeable about the next course of action.

 

Hmm, those could be very useful. I tried using diskpart in a command prompt with admin privileges, but it was so slow to react that I felt anything it could do, I could do with the GUI (apparently not). Thanks a lot for that man. Google was failing me.

... "doesn't actually provide a RAID redundancy"  *the volume is called RAID-5* ... If that's true, I don't even know what to think now. Because that is the most absurd thing I've ever heard. Calling something it's not on a Server OS. :|

We will call Microsoft support tomorrow to ask them about recovering the RAID array. 

 The cheapest would be to go with the same setup and just make sure that backups are scheduled appropriately. This is an opportunity for you to show your skills.

 

You could buy/build a ZFS box that serves stuff up over CIFS (Samba) shares, make sure that there is no single point of failure in the machine, and build another one for replication, if ZFS supports it (I think it does). If that is the case, you could set up a dedicated connection (10 gigabit probably recommended) between the two for replication purposes, and have separate NICs for connecting to client interfaces, plus one for management. This would probably be a cheap option and ZFS comes with all sorts of data protection benefits, but the downside is you have to go through some sort of learning curve.

 

I think you're right that an enterprise solution is a bad idea in the short term. However as the company grows (and the amount of data grows), that might change. I can testify to the ease-of-use, reliability and scalability of EqualLogic products, but that's about the extent of it. I can ask around at work for pricing information about our arrays (it's not public information) if you'd like.

Yep. I intend to. Using your "Reducing Single Points of Failure in Redundant Storage" as examples of what I would do. Specifically, these two options:

post-653-0-09709900-1383629083.jpg

This would be what I would prefer. We would get 4 servers set up just like this. We have 4 major drives (i.e. places we store frequently used and vital information). 2 would live on one server and the other 2 would live on another (splitting up vital things to limit what will go down in case of a full system failure). Or each server would host 1. Depends how my co workers would want to do it.

The rest of the more minor drives (I think there are 6-8) would live on the other 2 servers or be split evenly between all 4. Then all 4 of them would be backed up our ghetto way (via batch file with the xcopy command) to our "last resort" server, as well as to our current servers (two of which have had drive failures, which we would rebuild) regularly. Which I imagine would be fine. Our largest server is roughly 16TB (although it's in "RAID 5" which apparently isn't RAID 5)

The reason we call it our "Last Resort" is because it is our "If a tornado is coming, we grab that and run." so we don't lose "everything". We have off-site backups as well, but those are limited to our databases due to financial reasons (by the GB for a few TB's adds up fairly quickly)

I would prefer to use FreeNAS 9.1.1 or something similar because I have experience with it and have basically done what would be the basic requirements of a server to be used by us on it. i.e. set up CIFS shares, and have a good redundant backup (ZFS is sweeeeet). 

If that seems too expensive (4 drives per server with 4 servers with 4TB drives is at minimum $2000 just for the drives), I would go with this:

post-653-0-70221600-1383629001.jpg

And basically just have 2 major folders and some minor folders on 1 server and the same on another. For 2 servers. However, I feel like, after this fiasco, my coworkers are going to be all for the "Throw money at the problem to fix it." option. As will the management. 

I equate it to the healthcare adage that prevention is always cheaper than the cure. In the same way, buying more now is better than buying less now and needing more later (both in redundancy, larger drives, and the like). 

Anyway, thanks a lot man. You helped a bunch. Seriously. 

 

EDIT: Found this, it's a few years out of date, but its probably a ballpark estimate. Ignore the super-expensive ones, those are usually high-capacity or flash-based arrays.

...? They are all over $10,000. I mean, I get it's a few years out of date, but man. They're all expensive. 

I understand that there are obvious pros to going Enterprise, but again, we are a small company. The literal full extent of files we access is Excel documents and PDFs. If pictures, music, or video get on our drives, it's for company related things (Christmas parties and such) and is very rare.

The only thing that takes up huge amounts of space is Emails. And that only happens when we back up someone's emails multiple times without realizing it (what happened to the Last Resort server this time). For example, one of our longest time employees (9 years, we are a young company) had a Thunderbird email of 21.8GB. That's the biggest thing we have to worry about. I honestly thing 4TB is overkill, but will be enough for our future. 

Thanks though. Again, you've been really helpful. :D

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

... "doesn't actually provide a RAID redundancy"  *the volume is called RAID-5* ... If that's true, I don't even know what to think now. Because that is the most absurd thing I've ever heard. Calling something it's not on a Server OS. :|

I got that impression from the sections:

 

A dynamic volume's status is Healthy (At Risk). 

A dynamic disk's status is Offline or Missing.

 

They don't list the course of action to take if there are uncorrectable errors on a disk, but I'd imagine it'd be to restore from backup.

 

Let me know what they come back with.

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

I feel like, after this fiasco, my coworkers are going to be all for the "Throw money at the problem to fix it." option. As will the management. 

I equate it to the healthcare adage that prevention is always cheaper than the cure. In the same way, buying more now is better than buying less now and needing more later (both in redundancy, larger drives, and the like). 

Anyway, thanks a lot man. You helped a bunch. Seriously. 

 No problem :)

 

If you do go with ZFS, RAID 10 is definitely the best option as far as scalability goes. Get two more drives, hook them up (one to each controller), and add another RAID-1 vdev to the pool. Bam, four more TB. Plus rebuilds go quickly since there's zero parity to calculate.

 

Make sure that the L2ARC cache and the ZIL are on an SSD too :)

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

I got that impression from the sections:

 

A dynamic volume's status is Healthy (At Risk). 

A dynamic disk's status is Offline or Missing.

 

They don't list the course of action to take if there are uncorrectable errors on a disk, but I'd imagine it'd be to restore from backup.

 

Let me know what they come back with.

Will do. Thanks.

 

 No problem :)

 

If you do go with ZFS, RAID 10 is definitely the best option as far as scalability goes. Get two more drives, hook them up (one to each controller), and add another RAID-1 vdev to the pool. Bam, four more TB. Plus rebuilds go quickly since there's zero parity to calculate.

 

Make sure that the L2ARC cache and the ZIL are on an SSD too :)

I like that it sounds simple. I assume the RAID 1 vdev would go under the RAID 0 setup. Since that's how your picture was set up, it would just evolve to this:

post-653-0-67547600-1383629268.jpg

I was going to get 60GB (hopefully 120GB) SSD's for the cache and such. Is that overkill? Considering how badly ZFS eats RAM, I'd imagine it'd eat Cache in a similar way. 

Hmm, I use a USB drive for the OS at my house. Though I can't be sure of doing that for a server in a company. I should probably choose a motherboard with an mSATA connector and just get a little 20GB SSD for it. Or something. 

What would you do for that? FreeNAS is so light and such. Plus you can't partition in the GUI so you need a small drive to not waste space.

Edit: This (PC Part Picker link) is basically what I am thinking of pitching to my co workers for each server. That list was made with the entire idea of "We can upgrade later." in mind. We have room for 32GB of RAM. 2 more HDDs with the RAID card's 4 SATAIII ports. The important things for storage, I imagine. Cache + RAID 0 should mitigate them being 5900RPM drives, but since they are 4TB, they will move more data than smaller drives at the same speed.

Note: Ignore the compatibility issues. The odds of us getting a motherboard that can't use that CPU is next to nothing. And if we do it's an easy fix.

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Will do. Thanks.

 

I like that it sounds simple. I assume the RAID 1 vdev would go under the RAID 0 setup. Since that's how your picture was set up, it would just evolve to this:

post-653-0-67547600-1383629268.jpg

I was going to get 60GB (hopefully 120GB) SSD's for the cache and such. Is that overkill? Considering how badly ZFS eats RAM, I'd imagine it'd eat Cache in a similar way. 

Hmm, I use a USB drive for the OS at my house. Though I can't be sure of doing that for a server in a company. I should probably choose a motherboard with an mSATA connector and just get a little 20GB SSD for it. Or something. 

What would you do for that? FreeNAS is so light and such. Plus you can't partition in the GUI so you need a small drive to not waste space.

That's pretty much how it goes. You'll only be limited by the number of SATA ports on your controllers.

 

ZFS doesn't eat much RAM unless you enable data deduplication, in which case you'll want about 4 GB of RAM for every TB of storage. However for backups, especially e-mail backups, it drastically reduces the required amount of storage.

 

I'm not familiar with how much cache ZFS eats, though I know a 120GB SSD is common for the combined L2ARC and ZIL, maybe Eric1024 can shed some light? I know he has a partial tutorial here with links, maybe it'll be in one of those links.

 

For the FreeNAS install, maybe grab a 32GB SSD if they're still out there? There's not much you can do, no drives are that small anymore.

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

That's pretty much how it goes. You'll only be limited by the number of SATA ports on your controllers.

 

ZFS doesn't eat much RAM unless you enable data deduplication, in which case you'll want about 4 GB of RAM for every TB of storage. However for backups, especially e-mail backups, it drastically reduces the required amount of storage.

 

I'm not familiar with how much cache ZFS eats, though I know a 120GB SSD is common for the combined L2ARC and ZIL, maybe Eric1024 can shed some light? I know he has a partial tutorial here with links, maybe it'll be in one of those links.

 

For the FreeNAS install, maybe grab a 32GB SSD if they're still out there? There's not much you can do, no drives are that small anymore.

 

Cool. Read below about the Controllers. 

Hmm, we might want that at some point. 4GB for every TB? Wow. And I thought 1-1 was crazy. I assume you still count RAIDed amounts. So for us, it would be 16TB even though only 8TB would be usable per server. 

Then I will push for 120GB. Awesome. That link will be useful. :D

I'll just suggest small USB drives. Also, I snuck in an edit earlier. Sorry :P.

Edit: This (PC Part Picker link) is basically what I am thinking of pitching to my co workers for each server. That list was made with the entire idea of "We can upgrade later." in mind. We have room for 32GB of RAM. 2 more HDDs with the RAID card's 4 SATAIII ports. The important things for storage, I imagine. Cache + RAID 0 should mitigate them being 5900RPM drives, but since they are 4TB, they will move more data than smaller drives at the same speed.

Note: Ignore the compatibility issues. The odds of us getting a motherboard that can't use that CPU is next to nothing. And if we do it's an easy fix.

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

I'll just suggest small USB drives. Also, I snuck in an edit earlier. Sorry :P.

Isn't that a bad idea? Since the flash on USB sticks wears away much more quickly than that of an SSD.

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

Isn't that a bad idea? Since the flash on USB sticks wears away much more quickly than that of an SSD.

While that's true, the OS (FreeNAS) isn't write heavy. It's read heavy (if you can call that heavy) IIRC. It runs as an image (which is part of the reason it takes up a whole drive and partitioning isn't allowed). The only time a lot of writes occur is when upgrading OS (such as from FreeNAS 9.1.1 to 9.2). 

Plus, imagine if all you had to do to replace it was buy an $8 USB drive and clone the OS over from an image you had saved on another server (we love to do that). 

For reference, my FreeNAS is still running strong off of a SanDisk 8GB USB drive and I've reinstalled the OS a few times over the course of a year. It's handled roughly ~5TB of data transfer (if that even touches the OS drive, which I'm pretty sure it doesn't).

That's pretty good for $8 I think. 

Eh, I'll ask them if they'd rather buy a small SSD anyway. Just so it's all internal and no one can run off with it.

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Eh, I'll ask them if they'd rather buy a small SSD anyway. Just so it's all internal and no one can run off with it.

Probably worth asking about.

 

Single point of failure: Disgruntled employee

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

Probably worth asking about.

 

Single point of failure: Disgruntled employee

What this thread was entirely about. Lol 

Again, thanks for all your help. Really. This has been both educational and useful. It's great when things are both.

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Probably worth asking about.

 

Single point of failure: Disgruntled employee

Oh, just thought I would show you this so you knew and for posterity: http://windowsitpro.com/networking/q-ive-lost-disk-my-windows-server-2008-software-raid-5-how-do-i-repair-it

 

That's apparently the process for fixing a RAID 5 drive failure in Windows 2k8. So it is a true RAID, if that's how it works. Interestingly, that process is the one we didn't do (remove old, add new, convert to dynamic, repair missing drive). 

We will try that after our recovery software gets what it can off the RAID. It'll be done Friday. 

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×