Jump to content

Not getting the expected performance out of my FreeNAS server and I don't know why.

Sounds like this issue isn't necessarily an issue with ZFS. Might be an issue with your network FS as well. 

 

If copy performance from one dataset to another (I'm assuming these are from the same array?) is giving you 90MB/s of throughput, then you're actually getting combined 90MB/s read and 90MB/s write performance out of the same array, which is 180MB/s of throughput, which is about what I would expect. There's the large file/small file issue still but we can work that our later. Right now we need to figure out where the bottleneck actually is.

Yes, I just made two datasets under the same zPool and copied from one to the other. So definitely the same array. And that 180MB/s is what I would expect as well. 

It could very well be our network. Right now, since I just checked it, we are having issues with it. My supervisor calls it 'cascading' or something, but basically all the lights on our switches are blinging in sync and constantly. Which shouldn't be happening. 

Once we fix that, I'll try it again and see what happens.

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Sounds like this issue isn't necessarily an issue with ZFS. Might be an issue with your network FS as well. 

 

If copy performance from one dataset to another (I'm assuming these are from the same array?) is giving you 90MB/s of throughput, then you're actually getting combined 90MB/s read and 90MB/s write performance out of the same array, which is 180MB/s of throughput, which is about what I would expect. There's the large file/small file issue still but we can work that our later. Right now we need to figure out where the bottleneck actually is.

That does sound about right.

 

Try this test:

 

One volume, ~3 CIFS shares, one client server. Map the shares to the server and run large file copies to each, then measure the combined transfer speed. The point is to max something out on that one machine.

 

Now create another share (same volume), map it to a different server and start a file copy, check the transfer speed. If it's reasonable, and the transfer speeds on the other server don't go down, it might be a CPU issue.

 

Since CIFS is single threaded (one thread per share connection), the three original shares will take up 3 of the threads. If the fourth one doesn't change the transfer speeds of the others, then it could mean that the CPU's single threaded performance is the bottleneck.

 

Eric1024, does that make sense at all?

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

It could very well be our network. Right now, since I just checked it, we are having issues with it. My supervisor calls it 'cascading' or something, but basically all the lights on our switches are blinging in sync and constantly. Which shouldn't be happening.

Hm, that might you be having a problem with bridge loops?

It does sound vaguely similar to a story a friend of mine once told me. He used to

work in IT and once had a client whose company ethernet (small company, just a few

offices with a few people) was brought to its knees by this.

I'm not sure how he resolved the issue, but AFAIK, you could either go and find the

problem connections in your network topology and/or buy a switch which offers spanning

tree protocol.

I'm definitely no networking expert (you might probably know more than I do), but

this just reminded me of my friend's adventures with funky network behavior. ;)

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to comment
Share on other sites

Link to post
Share on other sites

-snip-

Running a test as described by you.

All 3 copies on my machine started around 20-35 MB/s and stayed that way for a bit. After 40 minutes, they are all at 5MB/s or so. 

The 4th copy on a server besides the FreeNAS machine started at 20MB/s (it started after the other 3 had been going for a few minutes) and is now at 10 MB/s. 

At one point, the single array was doing at least 150MB/s from copies since it was from and to the same array and then it's almost like it's throttling down and it's now at 50MB/s ((5+5+5+10)*2) throughput. 

And honestly, if that's how it is, then that is perfectly fine. It's possible I'm being throttled somewhere that I don't know about. Apparently the array can handle the large amount of throughput it's being given (150MB/s approximately). 

Since this is going to be used by multiple people (50-ish), and the most they will be doing is downloading a 50-100MB database file or reading them. I can't imagine them being throttled or wtv is happening to me.

-snip-

Hmm, strange. We usually just reset the switches to fix it. Obviously this isn't preferred because it means our users are disconnected from everything for a minute or two. But that does fix it, and it's a rare occurrence, so whatever is causing it isn't something we use often. Very weird. 

Basically, to everyone, I guess it's a false alarm. The array can handle the throughput expected of it (100-200MB/s), but it just doesn't do it for a single user on a single task. Weird, but as long as it works. 

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Basically, to everyone, I guess it's a false alarm. The array can handle the throughput expected of it (100-200MB/s), but it just doesn't do it for a single user on a single task. Weird, but as long as it works. 

You could do an isolated test. A single switch (not connected to any part of your company's network) that connects a computer to the FreeNAS machine. If file copies are fast and consistent on that connection, it's probably network related.

 

Hope everything works out.

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

You could do an isolated test. A single switch (not connected to any part of your company's network) that connects a computer to the FreeNAS machine. If file copies are fast and consistent on that connection, it's probably network related.

 

Hope everything works out.

Hmm, I'll do that then. I'll just have to find a server not in use atm to use as the other half. 

It will. I'm sure. :) Thanks for the suggestion.

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Hmm, strange. We usually just reset the switches to fix it. Obviously this isn't preferred because it means our users are disconnected from everything for a minute or two. But that does fix it, and it's a rare occurrence, so whatever is causing it isn't something we use often. Very weird.

Huh, curious. thinking.gif

If you ever get the chance to look into this more closely consider me interested

in that story. ;)

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to comment
Share on other sites

Link to post
Share on other sites

Huh, curious. thinking.gif

If you ever get the chance to look into this more closely consider me interested

in that story. ;)

Will do. Thanks. :)

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

That does sound about right.

 

Try this test:

 

One volume, ~3 CIFS shares, one client server. Map the shares to the server and run large file copies to each, then measure the combined transfer speed. The point is to max something out on that one machine.

 

Now create another share (same volume), map it to a different server and start a file copy, check the transfer speed. If it's reasonable, and the transfer speeds on the other server don't go down, it might be a CPU issue.

 

Since CIFS is single threaded (one thread per share connection), the three original shares will take up 3 of the threads. If the fourth one doesn't change the transfer speeds of the others, then it could mean that the CPU's single threaded performance is the bottleneck.

 

Eric1024, does that make sense at all?

That makes sense but if all he's using is gigabit, even if it's a few aggregated links, then it's definitely not the CPU. Opterons may not be the fastest single-thread performers out there but they're more than capable. For reference, I can push 5Gb/s using samba on an ivy i7 with plenty of breathing room, so he should be fine, but it's worth looking into.

 

Edit: also, it's worth mentioning that ZFS is well threaded, so on the off chance that it is a CPU issue, I doubt it would be with ZFS.

Workstation: 3930k @ 4.3GHz under an H100 - 4x8GB ram - infiniband HCA  - xonar essence stx - gtx 680 - sabretooth x79 - corsair C70 Server: i7 3770k (don't ask) - lsi-9260-4i used as an HBA - 6x3TB WD red (raidz2) - crucia m4's (60gb (ZIL, L2ARC), 120gb (OS)) - 4X8GB ram - infiniband HCA - define mini  Goodies: Røde podcaster w/ boom & shock mount - 3x1080p ips panels (NEC monitors for life) - k90 - g9x - sp2500's - HD598's - kvm switch

ZFS tutorial

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×