Jump to content

Having issues saturating 10G link

Cyberpower678

Hey guys,

 

So I have a (probably not unique) problem regarding getting my 10G link to my FreeNAS server saturated.  I have 2 Windows 10 PCs, one with an Aquantia NIC and the other with an Intel NIC, where both have problems iPerfing to FreeNAS.  I only get up to 3.5 Gbps back and forth to the NAS.  The NAS and the PCs are connected over a 10G switch.  I have eliminated the NICs and the switch as the culprit as I built another crappy test NAS server with a Celeron CPU and successfully pushed and pulled 9.9 Gbps between the two NASes with iPerf3.  This leads me to believe that Windows is the problem here and since both machines use an 7th gen i7 and a 9th gen i9, I don't think CPU is the issue either seeing as two Celeron driven serves do this quite easily.

 

If anyone has a clue what might be the problem, I'm open to all suggestions.

 

Also in case anyone is wondering, I'm using Cat 8 cables.  All of them have been tested and can handle the bandwidth.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 2 weeks later...
On 8/10/2020 at 2:30 PM, Electronics Wizardy said:

wht iperf command are you using? Are you running parallet streams?

 

What iperf performance do you get between the windows pcs?

For some reason I wasn't notified of your response.  I'm using iperf3 -c <server> -R and the same command without -R

I also tried parallel connections but it really caps out 6 Gbps.  My connections between the servers easily got 9.9 Gbps on a single thread, over the switch.

 

I have run iPerf on windows and got the only 1.6 Gbps back and forth on a single thread and up to 5 Gbps on 10 threads.

Link to comment
Share on other sites

Link to post
Share on other sites

I have the MTU set to 9000 so it should be.  I only have intermediate skills in properly networking so I might be missing something.  But CPU usage is pretty minimal during the test.  When I tested my two NASes, with Celeron CPUs, the CPU usage was very low.  No more than 10% and they achieved maximum throughput on a single stream.  I'm inclined to believe this may be an issue with Windows 10 and how it handles windowing, but this can't be happening for everyone or Microsoft would be getting buried by a shit load of angry users complaining about it.  So I'm wondering what I'm doing wrong.   Happy to do any kind of testing and post the results.

Link to comment
Share on other sites

Link to post
Share on other sites

Jumbo frames aren't going to make a difference here, so don't worry about setting them.  This is a common misconception folks have about networking: setting jumbos fixes everything.  Setting jumbo frames can help a lot if your CPU is burdened down with other things, and doesn't have the cycles to chop up the network traffic into 1500 byte packets (and/or reassemble them).  So unless your Core i7 or i9 processors are running full tilt, jumbos won't matter.

 

I don't know what exactly is going on here.  You can try to work on the Aquantia-equipped box first.  

 

https://rog.asus.com/forum/showthread.php?106476-Aquantia-10Gbe-fix-by-disabling-one-advanced-setting-on-the-chip-thru-device-manager!

 

See if that helps.

 

Editing Rig: Mac Pro 7,1

System Specs: 3.2GHz 16-core Xeon | 96GB ECC DDR4 | AMD Radeon Pro W6800X Duo | Lots of SSD and NVMe storage |

Audio: Universal Audio Apollo Thunderbolt-3 Interface |

Displays: 3 x LG 32UL950-W displays |

 

Gaming Rig: PC

System Specs:  Asus ROG Crosshair X670E Extreme | AMD 7800X3D | 64GB G.Skill Trident Z5 NEO 6000MHz RAM | NVidia 4090 FE card (OC'd) | Corsair AX1500i power supply | CaseLabs Magnum THW10 case (RIP CaseLabs ) |

Audio:  Sound Blaster AE-9 card | Mackie DL32R Mixer | Sennheiser HDV820 amp | Sennheiser HD820 phones | Rode Broadcaster mic |

Display: Asus PG32UQX 4K/144Hz displayBenQ EW3280U display

Cooling:  2 x EK 140 Revo D5 Pump/Res | EK Quantum Magnitude CPU block | EK 4090FE waterblock | AlphaCool 480mm x 60mm rad | AlphaCool 560mm x 60mm rad | 13 x Noctua 120mm fans | 8 x Noctua 140mm fans | 2 x Aquaero 6XT fan controllers |

Link to comment
Share on other sites

Link to post
Share on other sites

21 hours ago, jasonvp said:

Jumbo frames aren't going to make a difference here, so don't worry about setting them.  This is a common misconception folks have about networking: setting jumbos fixes everything.  Setting jumbo frames can help a lot if your CPU is burdened down with other things, and doesn't have the cycles to chop up the network traffic into 1500 byte packets (and/or reassemble them).  So unless your Core i7 or i9 processors are running full tilt, jumbos won't matter.

 

I don't know what exactly is going on here.  You can try to work on the Aquantia-equipped box first.  

 

https://rog.asus.com/forum/showthread.php?106476-Aquantia-10Gbe-fix-by-disabling-one-advanced-setting-on-the-chip-thru-device-manager!

 

See if that helps.

 

So I’ve made some very interesting observation.  Your link thread did help a bit as I now got speeds up to 6 Gbps on a single stream now.  It’s funny how my FreeNAS running on a Celeron barely breaks a sweat when transferring at high speeds.  CPU usage barely breaks 15%.  But my Aquantia box which is on Windows 10, running an Intel i7-7700K, maxes out a single core when it tries to do a transfer.  Talk about insane inefficiency compared to FreeBSD.  I also changed the jumbo packets property to 16M, on the adapter, but Windows is still negotiating the MTU to 1500.  I’m wondering if fixing that might make it possible to finally max out the through put.

Link to comment
Share on other sites

Link to post
Share on other sites

28 minutes ago, Cyberpower678 said:

 But my Aquantia box which is on Windows 10, running an Intel i7-7700K, maxes out a single core when it tries to do a transfer. 

OK, that's new data.  And it implies that perhaps 9K jumbo frames might help a bit.  Go back through and kick all the MTUs up to 9K across the board (don't forget the switch interfaces!) and re-try your test.

 

And yes, *BSD is vastly more efficient at networking than anything out of Redmond.  As I've posted in other threads on this topic, with default MTU, two FreeBSD boxes talking to one another across the same switch:

 

joker$ iperf3 -c 192.168.10.3
Connecting to host 192.168.10.3, port 5201
[  5] local 192.168.10.1 port 59745 connected to 192.168.10.3 port 5201
[clip]
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.9 GBytes  9.35 Gbits/sec    0             sender
[  5]   0.00-10.04  sec  10.9 GBytes  9.31 Gbits/sec                  receiver

 

And my Mac Pro (Aquantia chips) talking to joker across two different switches:

harleyquinn$ iperf3 -c 192.168.10.1
Connecting to host 192.168.10.1, port 5201
[  4] local 192.168.10.10 port 61603 connected to 192.168.10.1 port 5201
[clip]
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  10.9 GBytes  9.38 Gbits/sec                  sender
[  4]   0.00-10.00  sec  10.9 GBytes  9.38 Gbits/sec                  receiver

 

Those are just single streams, mind you.  My Windows box, which is connected to the same switch as my Mac Pro, can't do that over a single stream.  I have to parallelize it.  It almost seems like there's a hard 2.x Gbits/sec per stream limit.  Here are two runs, one with two streams, one with four:

 

F:\users\jvp\Program Files\iperf3>iperf3 -c 192.168.10.1 -P 2
Connecting to host 192.168.10.1, port 5201
[  4] local 192.168.10.52 port 52030 connected to 192.168.10.1 port 5201
[  6] local 192.168.10.52 port 52031 connected to 192.168.10.1 port 5201
[clip]
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  2.68 GBytes  2.30 Gbits/sec                  sender
[  4]   0.00-10.00  sec  2.68 GBytes  2.30 Gbits/sec                  receiver
[  6]   0.00-10.00  sec  2.64 GBytes  2.27 Gbits/sec                  sender
[  6]   0.00-10.00  sec  2.64 GBytes  2.27 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  5.32 GBytes  4.57 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  5.32 GBytes  4.57 Gbits/sec                  receiver

See?  ~4.5Gbit/sec.  But double the number of streams to four and:

F:\users\jvp\Program Files\iperf3>iperf3 -c 192.168.10.1 -P 4
Connecting to host 192.168.10.1, port 5201
[  4] local 192.168.10.52 port 52038 connected to 192.168.10.1 port 5201
[  6] local 192.168.10.52 port 52039 connected to 192.168.10.1 port 5201
[  8] local 192.168.10.52 port 52040 connected to 192.168.10.1 port 5201
[ 10] local 192.168.10.52 port 52041 connected to 192.168.10.1 port 5201
[clip]
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  2.67 GBytes  2.29 Gbits/sec                  sender
[  4]   0.00-10.00  sec  2.67 GBytes  2.29 Gbits/sec                  receiver
[  6]   0.00-10.00  sec  2.69 GBytes  2.31 Gbits/sec                  sender
[  6]   0.00-10.00  sec  2.69 GBytes  2.31 Gbits/sec                  receiver
[  8]   0.00-10.00  sec  2.73 GBytes  2.34 Gbits/sec                  sender
[  8]   0.00-10.00  sec  2.73 GBytes  2.34 Gbits/sec                  receiver
[ 10]   0.00-10.00  sec  2.57 GBytes  2.21 Gbits/sec                  sender
[ 10]   0.00-10.00  sec  2.57 GBytes  2.21 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  10.7 GBytes  9.15 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  10.7 GBytes  9.15 Gbits/sec                  receiver

 

...it's pretty much at the max.

 

IH8WIN

 

Editing Rig: Mac Pro 7,1

System Specs: 3.2GHz 16-core Xeon | 96GB ECC DDR4 | AMD Radeon Pro W6800X Duo | Lots of SSD and NVMe storage |

Audio: Universal Audio Apollo Thunderbolt-3 Interface |

Displays: 3 x LG 32UL950-W displays |

 

Gaming Rig: PC

System Specs:  Asus ROG Crosshair X670E Extreme | AMD 7800X3D | 64GB G.Skill Trident Z5 NEO 6000MHz RAM | NVidia 4090 FE card (OC'd) | Corsair AX1500i power supply | CaseLabs Magnum THW10 case (RIP CaseLabs ) |

Audio:  Sound Blaster AE-9 card | Mackie DL32R Mixer | Sennheiser HDV820 amp | Sennheiser HD820 phones | Rode Broadcaster mic |

Display: Asus PG32UQX 4K/144Hz displayBenQ EW3280U display

Cooling:  2 x EK 140 Revo D5 Pump/Res | EK Quantum Magnitude CPU block | EK 4090FE waterblock | AlphaCool 480mm x 60mm rad | AlphaCool 560mm x 60mm rad | 13 x Noctua 120mm fans | 8 x Noctua 140mm fans | 2 x Aquaero 6XT fan controllers |

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, jasonvp said:

OK, that's new data.  And it implies that perhaps 9K jumbo frames might help a bit.  Go back through and kick all the MTUs up to 9K across the board (don't forget the switch interfaces!) and re-try your test.

 

And yes, *BSD is vastly more efficient at networking than anything out of Redmond.  As I've posted in other threads on this topic, with default MTU, two FreeBSD boxes talking to one another across the same switch:

 


joker$ iperf3 -c 192.168.10.3
Connecting to host 192.168.10.3, port 5201
[  5] local 192.168.10.1 port 59745 connected to 192.168.10.3 port 5201
[clip]
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.9 GBytes  9.35 Gbits/sec    0             sender
[  5]   0.00-10.04  sec  10.9 GBytes  9.31 Gbits/sec                  receiver

 

And my Mac Pro (Aquantia chips) talking to joker across two different switches:


harleyquinn$ iperf3 -c 192.168.10.1
Connecting to host 192.168.10.1, port 5201
[  4] local 192.168.10.10 port 61603 connected to 192.168.10.1 port 5201
[clip]
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  10.9 GBytes  9.38 Gbits/sec                  sender
[  4]   0.00-10.00  sec  10.9 GBytes  9.38 Gbits/sec                  receiver

 

Those are just single streams, mind you.  My Windows box, which is connected to the same switch as my Mac Pro, can't do that over a single stream.  I have to parallelize it.  It almost seems like there's a hard 2.x Gbits/sec per stream limit.  Here are two runs, one with two streams, one with four:

 


F:\users\jvp\Program Files\iperf3>iperf3 -c 192.168.10.1 -P 2
Connecting to host 192.168.10.1, port 5201
[  4] local 192.168.10.52 port 52030 connected to 192.168.10.1 port 5201
[  6] local 192.168.10.52 port 52031 connected to 192.168.10.1 port 5201
[clip]
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  2.68 GBytes  2.30 Gbits/sec                  sender
[  4]   0.00-10.00  sec  2.68 GBytes  2.30 Gbits/sec                  receiver
[  6]   0.00-10.00  sec  2.64 GBytes  2.27 Gbits/sec                  sender
[  6]   0.00-10.00  sec  2.64 GBytes  2.27 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  5.32 GBytes  4.57 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  5.32 GBytes  4.57 Gbits/sec                  receiver

See?  ~4.5Gbit/sec.  But double the number of streams to four and:


F:\users\jvp\Program Files\iperf3>iperf3 -c 192.168.10.1 -P 4
Connecting to host 192.168.10.1, port 5201
[  4] local 192.168.10.52 port 52038 connected to 192.168.10.1 port 5201
[  6] local 192.168.10.52 port 52039 connected to 192.168.10.1 port 5201
[  8] local 192.168.10.52 port 52040 connected to 192.168.10.1 port 5201
[ 10] local 192.168.10.52 port 52041 connected to 192.168.10.1 port 5201
[clip]
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  2.67 GBytes  2.29 Gbits/sec                  sender
[  4]   0.00-10.00  sec  2.67 GBytes  2.29 Gbits/sec                  receiver
[  6]   0.00-10.00  sec  2.69 GBytes  2.31 Gbits/sec                  sender
[  6]   0.00-10.00  sec  2.69 GBytes  2.31 Gbits/sec                  receiver
[  8]   0.00-10.00  sec  2.73 GBytes  2.34 Gbits/sec                  sender
[  8]   0.00-10.00  sec  2.73 GBytes  2.34 Gbits/sec                  receiver
[ 10]   0.00-10.00  sec  2.57 GBytes  2.21 Gbits/sec                  sender
[ 10]   0.00-10.00  sec  2.57 GBytes  2.21 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  10.7 GBytes  9.15 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  10.7 GBytes  9.15 Gbits/sec                  receiver

 

...it's pretty much at the max.

 

IH8WIN

 

You pretty much have identical numbers to me when I did my BSD to BSD testing and then Windows to BSD.  What changes is the parallelization.  No matter how many parallel streams I get, it will bog down at 6 Gbps MAX.  After some additional tweaking I managed to get the receive stream to run up to 7 Gbps on a single stream for Windows, but send is still stuck at 3.5 Gbps, which is at least higher than it was before.  The Aquantia controller supports 16K jumbos, so I cranked it all the way up.  Windows is claiming that the MTU for the adapter is now 16334, but when I view the network properties it still says MTU of 1500.  This part is confusing me.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Cyberpower678 said:

The Aquantia controller supports 16M jumbos, so I cranked it all the way up.  Windows is claiming that the MTU for the adapter is now 16334, but when I view the network properties it still says MTU of 1500.  This part is confusing me.

Be careful with setting your MTU.  It has to match on either end of the connection if you want to avoid packet fragmentation.  And fragging will drastically slow things down.  Your switch almost certainly can not handle 16M.  Very likely it's limited to around 9216 or there abouts.  That's why you set the MTU to 9K to give yourself a little play room.

 

So again, go back through and set them all to 9K.  That includes the server port and the port on the switch facing the server, as well.  Are you setting the MTU via the command line in Windows?  You have to.  Setting it in the GUI doesn't work right.

 

https://myrandomtechblog.com/cryptomining/change-mtu-size-in-windows-10/

 

 

Editing Rig: Mac Pro 7,1

System Specs: 3.2GHz 16-core Xeon | 96GB ECC DDR4 | AMD Radeon Pro W6800X Duo | Lots of SSD and NVMe storage |

Audio: Universal Audio Apollo Thunderbolt-3 Interface |

Displays: 3 x LG 32UL950-W displays |

 

Gaming Rig: PC

System Specs:  Asus ROG Crosshair X670E Extreme | AMD 7800X3D | 64GB G.Skill Trident Z5 NEO 6000MHz RAM | NVidia 4090 FE card (OC'd) | Corsair AX1500i power supply | CaseLabs Magnum THW10 case (RIP CaseLabs ) |

Audio:  Sound Blaster AE-9 card | Mackie DL32R Mixer | Sennheiser HDV820 amp | Sennheiser HD820 phones | Rode Broadcaster mic |

Display: Asus PG32UQX 4K/144Hz displayBenQ EW3280U display

Cooling:  2 x EK 140 Revo D5 Pump/Res | EK Quantum Magnitude CPU block | EK 4090FE waterblock | AlphaCool 480mm x 60mm rad | AlphaCool 560mm x 60mm rad | 13 x Noctua 120mm fans | 8 x Noctua 140mm fans | 2 x Aquaero 6XT fan controllers |

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, jasonvp said:

Be careful with setting your MTU.  It has to match on either end of the connection if you want to avoid packet fragmentation.  And fragging will drastically slow things down.  Your switch almost certainly can not handle 16M.  Very likely it's limited to around 9216 or there abouts.  That's why you set the MTU to 9K to give yourself a little play room.

 

So again, go back through and set them all to 9K.  That includes the server port and the port on the switch facing the server, as well.  Are you setting the MTU via the command line in Windows?  You have to.  Setting it in the GUI doesn't work right.

 

https://myrandomtechblog.com/cryptomining/change-mtu-size-in-windows-10/

 

 

I will turn it down a bit then, but aside from that, it seems I just managed to push to the server at 9.9 Gbps.  Not sure what I changed to get it work, but I'll take it.

 

Now I need to fix the Intel box running an i9-9900K.

Link to comment
Share on other sites

Link to post
Share on other sites

So the Intel box is most definitely not being CPU throttled.  I'm not noticing any changes in the cores when I start iPerfing with it.  Changing to 9K jumbos did nothing as I expected, so something else is going on here.  I'm not sure what's bogging this speed down. :/  I'm using an Intel  X540

Link to comment
Share on other sites

Link to post
Share on other sites

In the advanced properties (Right click on NIC in Control Panel\Network and Internet\Network Connections > Properties > Advanced) and look around for anything like VMQ or Virtual Machine queues or something like that and if it's enabled turn it off. One of my NICs had that on by default and it crippled my speed. It was a SolarFlare NIC and not an Intel NIC but just worth checking on.

 

Edit:

Also check on Priority & VLAN and disable that too.

Current Network Layout:

Current Build Log/PC:

Prior Build Log/PC:

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Lurick said:

In the advanced properties (Right click on NIC in Control Panel\Network and Internet\Network Connections > Properties > Advanced) and look around for anything like VMQ or Virtual Machine queues or something like that and if it's enabled turn it off. One of my NICs had that on by default and it crippled my speed. It was a SolarFlare NIC and not an Intel NIC but just worth checking on.

 

Edit:

Also check on Priority & VLAN and disable that too.

Nothing closely related to VMQ found, but I disabled Packet Priority and VLAN, but no effect.

I just don't get how Windows 10 could be so spectacularly misconfigured for something so widely used. 😞

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Cyberpower678 said:

Changing some adapter settings per that thread got me up to 6 Gbps Received and 3.2 Gbps send.  So definite improvement there.  I'm not even going to use the tunables mentioned in that thread as those are simply ridiculous and will cause massive packet loss.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×