Jump to content

Fileserver access for small/medium company is super slow – how to find bottleneck?

tonkab

Hi all,

 

I've just taken over the (small) IT-department at a small/medium company (150ppl) and am working myself through the issues we're currently having.

My brackground is not a typical ~sysadmin role, so I'm missing a lot of background and am trying to catch up quickly – which is why I'm asking here.

 

Situation: we have a fileserver that is shared by everyone in the company for everything from excel files to photoshop (no video, though).

Runs on Windows, network is behind a Sophos firewall.

 

Problem: doing anything on the fileserver is dog slow, and navigating more than ~three folders deep is often impossible because the connection breaks.

I'm hearing this complaint most often from Mac users connecting through smb://, but Windows users seem to have the same problem.

 

Question: Where should I look for the bottleneck, or what other information would you need from me to help narrow it down?

CPU and RAM utilization on the fileserver seem fine (2% and 50% according to TaskManager on the VM that runs the fileserver), but I'm happy to investigate this more if you think this is the likely culprit.

 

Are there any tools or standard/checklist-based tests I can run to figure this out?

 

Any pointers at all are greatly appreciated!

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

How is it all connected to the network? 

If you run a single 1gbit connection to 150 PC's you are going to have a bad time. 

 

Also, do you have the full specs of the server? 

 

 

Edit: maybe can you try to access the shares after work when nobody is there anymore to see if it's still slow for just one user. 

Gaming HTPC:

R5 5600X - Cryorig C7 - Asus ROG B350-i - EVGA RTX2060KO - 16gb G.Skill Ripjaws V 3333mhz - Corsair SF450 - 500gb 960 EVO - LianLi TU100B


Desktop PC:
R9 3900X - Peerless Assassin 120 SE - Asus Prime X570 Pro - Powercolor 7900XT - 32gb LPX 3200mhz - Corsair SF750 Platinum - 1TB WD SN850X - CoolerMaster NR200 White - Gigabyte M27Q-SA - Corsair K70 Rapidfire - Logitech MX518 Legendary - HyperXCloud Alpha wireless


Boss-NAS [Build Log]:
R5 2400G - Noctua NH-D14 - Asus Prime X370-Pro - 16gb G.Skill Aegis 3000mhz - Seasonic Focus Platinum 550W - Fractal Design R5 - 
250gb 970 Evo (OS) - 2x500gb 860 Evo (Raid0) - 6x4TB WD Red (RaidZ2)

Synology-NAS:
DS920+
2x4TB Ironwolf - 1x18TB Seagate Exos X20

 

Audio Gear:

Hifiman HE-400i - Kennerton Magister - Beyerdynamic DT880 250Ohm - AKG K7XX - Fostex TH-X00 - O2 Amp/DAC Combo - 
Klipsch RP280F - Klipsch RP160M - Klipsch RP440C - Yamaha RX-V479

 

Reviews and Stuff:

GTX 780 DCU2 // 8600GTS // Hifiman HE-400i // Kennerton Magister
Folding all the Proteins! // Boincerino

Useful Links:
Do you need an AMP/DAC? // Recommended Audio Gear // PSU Tier List 

Link to comment
Share on other sites

Link to post
Share on other sites

Open Resource Monitor, go to the Disk tab and monitor disk active % time and also Disk Queue. First instinct from this type of issue is disk performance, so best to rule that out first.

 

You will also need to look at equivalent disk statistics at the VM host level, ESXi? Then also at the storage array level if it's not direct attached storage.

Link to comment
Share on other sites

Link to post
Share on other sites

Thank you both, very helpful already!

1 hour ago, FloRolf said:

How is it all connected to the network? 

If you run a single 1gbit connection to 150 PC's you are going to have a bad time. 

 

Also, do you have the full specs of the server? 

 

 

Edit: maybe can you try to access the shares after work when nobody is there anymore to see if it's still slow for just one user. 

10Gbit to a 10Gbit Switch, connected to a 10Gbit fiber switch (is my current understanding).

The host running the VMs is a PRIMERGY RX2540 M4, with a Xeon Gold 5115 @ 2.40GHz, 40 logical cores, 127GB RAM, 12 network interfaces, running 10 VMs.

1 hour ago, leadeater said:

Open Resource Monitor, go to the Disk tab and monitor disk active % time and also Disk Queue. First instinct from this type of issue is disk performance, so best to rule that out first.

 

You will also need to look at equivalent disk statistics at the VM host level, ESXi? Then also at the storage array level if it's not direct attached storage.

So we have four volumes, one is RAID5 and SSDs, then three that are RAID6 with HDDs. For some reason (that maybe you know) vCenter tells me that there's no data for storage I/O performance when I select them directly.

In Resource Monitor the scale for Queue Depth changes sometimes, but at least for C:\ it seems to shoot over 1 regularly. I've attached a screenshot.

 

Does this help?

 

Update: oh, I should also mention that this seems to most often be a problem when people are working remotely and connecting via VPN, so maybe that is the bottleneck as well. Right now, for example, internal access (from my machine at least) works fine, but if I connect via VPN through my phone it breaks a lot. Not always the case, though.

Bildschirmfoto 2022-01-24 um 10.09.20.png

Edited by tonkab
Providing more info
Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, tonkab said:

Update: oh, I should also mention that this seems to most often be a problem when people are working remotely and connecting via VPN, so maybe that is the bottleneck as well. Right now, for example, internal access (from my machine at least) works fine, but if I connect via VPN through my phone it breaks a lot. Not always the case, though.

SMB share access over VPNs is horrific. SMB hates network latency to an extreme degree and it doesn't take much to start causing significant problems. You may also find the VPN server, whatever that is (the Sophos?, Physical or a VM?), performance/capability is getting maxed out.

 

You can test this out by RDP to another computer or server that is on the local network from the remote computer and access the same shares and folders at the same time one via the VPN and Explorer/Finder and the other through the RDP session.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, leadeater said:

SMB share access over VPNs is horrific. SMB hates network latency to an extreme degree and it doesn't take much to start causing significant problems. You may also find the VPN server, whatever that is (the Sophos?, Physical or a VM?), performance/capability is getting maxed out.

Properly configured SMB is very tolerant of high (intercontinental) latency, but I definitely agree to look at the VPN first.

 

After the VPN, check out the Microsoft File Server tuning guide: https://docs.microsoft.com/en-us/windows-server/administration/performance-tuning/role/file-server/

 

Be aware that running in VMWare if you're not using the VMXNET3 adapter also puts in a pretty severe penalty for using the system as a file server.  Ideally on a high latency connection, you'd run it on the iron, or at least pass through the NIC with full capability to the OS.  Either way though, after tuning the OS, you'll likely need to tune the VM as well (side note: this is somewhere that Hyper-V shines, if you're running Microsoft services on Microsoft operating systems on Hyper-V, it just works out of the box magically)

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, jec6613 said:

Properly configured SMB is very tolerant of high (intercontinental) latency, but I definitely agree to look at the VPN first.

It's really not, you need WAN accelerators and good client side VPN applications to mitigate it. Completely standard default SMB facing any kind of latency results in horrid performance, no amount of registry tweaks or GPOs (same thing anyway) will properly and completely mitigate this and the higher the latency the less effective those tunings become.

 

Quote

These are all very good suggestions, but they are hit or miss on whether or not they actually work. There’s a reason for this: SMB sucks on high latency connections! In 2009, Microsoft published a paper about planning for bandwidth requirements. Part of the paper compared SMB performance with and without latency. Below is the results of crawling and enumerating a file share over a 10 Mbps connection with 100 ms of latency:

1-1.png

https://www.mirazon.com/issues-with-smb-file-transfer-performance-over-vpn/

 

Good luck with this, even Microsoft says and shows SMB over high latency networks is crap 😉

 

Quote

Trying to optimize SMB over a WAN is like trying to optimize a square wheel, just use a round one instead.

 

RDS Published Apps is one of the round wheel options 🙂

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, leadeater said:

It's really not, you need WAN accelerators and good client side VPN applications to mitigate it. Completely standard default SMB facing any kind of latency results in horrid performance, no amount of registry tweaks or GPOs (same thing anyway) will properly and completely mitigate this and the higher the latency the less effective those tunings become.

 

1-1.png

https://www.mirazon.com/issues-with-smb-file-transfer-performance-over-vpn/

 

Good luck with this, even Microsoft says and shows SMB over high latency networks is crap 😉

This is all old data based on the (at the time state of the art) SMB2 protocol.  SMB3 provided enormous improvement to this, which is where file server tuning comes in to play.  In order to tolerate high latency connections, one of the most important things to do is allow for more simultaneous transactions in-flight to the same client.  This means that browsing a directory actually queries many (or all) of the files simultaneously, rather than sequentially as in SMB2.  Nowadays, SMB3 is performant enough to allow application data to be run over intercontinental connections between datacenters using SMB3, and it's done all the time.

 

However, on a small file server without RDMA NICs, this is all juggled in the CPU and main memory, and there are definite practical upper limits to this, especially with a virtual file server.  Which is where my comment about the vNIC came in earlier.

Link to comment
Share on other sites

Link to post
Share on other sites

33 minutes ago, jec6613 said:

SMB3 provided enormous improvement to this, which is where file server tuning comes in to play.  In order to tolerate high latency connections, one of the most important things to do is allow for more simultaneous transactions in-flight to the same client.  This means that browsing a directory actually queries many (or all) of the files simultaneously, rather than sequentially as in SMB2.  Nowadays, SMB3 is performant enough to allow application data to be run over intercontinental connections between datacenters using SMB3, and it's done all the time.

It really didn't improve much. We have thousands of remote workers using FortiGate SSL VPNs connecting to Netapp SMB3 shares, the performance still sucks.

 

We also have Site-to-Site MPLS private connections, no VPN, between datacenters and even with just 4ms the performance is significantly worse so not every application and workload is viable. Simple browsing shares and office documents totally fine of course.

 

We use Publish Apps because it's vastly superior, as well as OneDrive and Teams.

 

33 minutes ago, jec6613 said:

However, on a small file server without RDMA NICs, this is all juggled in the CPU and main memory, and there are definite practical upper limits to this, especially with a virtual file server.  Which is where my comment about the vNIC came in earlier.

RDMA makes next to zero difference if the client is on a high latency connection, neither will the connection be eligible for RDMA so will not be utilized. RDMA is for achieving ultra low latency which is impossible if the distant client connection is 50ms-100ms. NIC offloads still work on vNICs as well so you don't need to do passthrough or SR-IOV to get high performance, you can achieve better but it's rare to need to do it.

 

Checking VMXNET3 is a very good idea though, E1000 type sucks.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, leadeater said:

It really didn't improve much. We have thousands of remote workers using FortiGate SSL VPNs connecting to Netapp SMB3 shares, the performance still sucks.

SMB is hampered in IOPS based loads. The more files in a share, the slower the access will be. The worst scenario is having close to a million files each being very tiny. That will slam SMB sessions if needing to access them as a file-based database and you're not already referencing SQL for the index; just as an example.

 

Regarding FortiClient SSL VPNs, try enabling 'Preferred DTLS Tunnel' within the client. Because it's going over UDP instead of TCP, IOPS, and thus SMB performance should improve.

Link to comment
Share on other sites

Link to post
Share on other sites

Not clear from the OP if the issue is local users or just remote users. It sounds like both.. While I agree that excessive SMB over head over a VPN could be the culprit (constant rekeying over ipsec is another possibility)  it sounds like local users are having a big issue. Please tell me you aren't using older Cisco ASAs.

 

Apple's SMB protocol makes windows look like BSD. This is why so many companies punt and use something like Acronis to handle Mac users on Windows servers. I dont care if I offend anybody. Macs don't play nice on enterprise networks. 

 

Using the proper VMware guest NIC adapters mentioned above is mandatory. 

 

I've had literally thousands of Windows clients connected to single Windows file servers running fast ethernet with no issues. As long as disc/storage latency is solid there will be no issues on that end.

 

RDP and Published Apps just move network and disc over head local. Of course it fixes it. Works for remote users. Doesnt help local network users.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, wseaton said:

Macs don't play nice on enterprise networks. 

They REALLY don't lol. Also Apple's SMB implementation is sooo trash, doesn't support all the special NTFS permissions like Traverse Through Folder and insists you must have Read access to go through folder levels, ughhh.

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, wseaton said:

Apple's SMB protocol makes windows look like BSD.

 

3 hours ago, leadeater said:

They REALLY don't lol. Also Apple's SMB implementation is sooo trash, doesn't support all the special NTFS permissions like Traverse Through Folder and insists you must have Read access to go through folder levels, ughhh.

^Totally agree 100% with the above statements.

I've used a MacBook years ago, and the SMB implementation has always been sketchy at best. The worst part is when MB users traverse the share directories, they crap those damn .DS_Store files all over the place; it's like mouse droppings. In the past I would run a script to delete them every week. You can disable their creation, but this must be done from the client side.

https://support.apple.com/en-us/HT208209

Speed up browsing on network shares

To speed up SMB file browsing, you can prevent macOS from reading .DS_Store files on SMB shares. This makes the Finder use only basic information to immediately display each folder's contents in alphanumeric order. Use this Terminal command:

defaults write com.apple.desktopservices DSDontWriteNetworkStores -bool TRUE

Then log out of your macOS account and log back in.

To reenable sorting, use this command:

defaults write com.apple.desktopservices DSDontWriteNetworkStores -bool FALSE

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×