Jump to content

40-Gigabit Fiber Problem

After seeing Linus's exciting videos testing 40 GbE and 100 GbE networking, I decided to give things a try at home!  This was especially useful since I couldn't warrant bombing $1,200 on a 10 GbE switch that would only have 2 computers using it at the time.  So we decided to take the plunge and bomb $900 on a pair of Mellanox 40-Gigabit network cards (along with 40-gig rated QSFP+ cabling from FS.com).

Becoming content creators in our own right, my drummer and I live in a band house together.  We like to send big video files of our music/practices and movies, etc between our machines.  Despite some wicked performance, we cannot seem to break 25 Gbps when copying large 10GB, 20GB, and 40GB test files.  Our Seagate 520 Firecuda's consistently benchmark at 5,000 MB / sec reads with 4,250 MB / sec writes.  I figured with approx 4250 MB / sec write speeds, I can expect approximately 34 Gbps speeds yet we consistently fall way short of this!

So the next step, I thought, well why not take the network completely out of the equation (since I run two of these drives in my system - One for OS and one for network transfers).  Even between two identical Seagate 520 Firecuda drives in my machine, the files would only copy at approx 2,200 MB / sec - Falling FAR short of the benchmarked speeds.....

I was SHOCKED....Can anyone explain these discrepancies between SSD benchmark speeds vs actual transfer speeds? Why would a file copy at only 51.7% (2,200 / sec vs 4,250 MB / sec) of its benchmarked write speeds??

Also, if there is any optimization that I can do with the NICs themselves, please let me know!


(Clicky maximize button in bottom right of video to see)
 


We have the following gear:

My machine:
X570 motherboard
AMD 16-core Ryzen 9 3950X CPU
64 GB RAM at 3200 Mhz
40-Gigabit Mellanox ConnectX-4 Lx Ethernet Adapter
File-sharing Drive (OS not running on it):  2TB Seagate Firecuda 520

His machine:
X570 motherboard
AMD 8-core Ryzen 7 3800X CPU
32 GB RAM at 3200 Mhz
40-Gigabit Mellanox ConnectX-4 Lx Ethernet Adapter
File-sharing Drive (OS not running on it):  2TB Seagate Firecuda 520
Link to comment
Share on other sites

Link to post
Share on other sites

Try using robocopy with multiple threads. Im pretty sure the normal copy in windows is a single task, so the drives are running at a queue depth of 1. Those benchmarks are running at a unrealastic queue depth of like 32.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Electronics Wizardy said:

Try using robocopy with multiple threads. Im pretty sure the normal copy in windows is a single task, so the drives are running at a queue depth of 1. Those benchmarks are running at a unrealastic queue depth of like 32.

I was already thinking of getting this program - Blew my mind that something like Windows 10 Pro nowadays doesn't utilize MULTIPLE threads to copy.  Do you think this is what is tanking the performance so heavily?  How much of a disparity can I reasonably expect to see between the BENCHMARKED write speeds and actual performance?

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, DuaLeaD said:

I was already thinking of getting this program - Blew my mind that something like Windows 10 Pro nowadays doesn't utilize MULTIPLE threads to copy.  Do you think this is what is tanking the performance so heavily?  How much of a disparity can I reasonably expect to see between the BENCHMARKED write speeds and actual performance?

did you test the copy speeds using robocopy multithreaded.

 

Really depends on the worklload, but I could see it making the difference here.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Electronics Wizardy said:

did you test the copy speeds using robocopy multithreaded.

 

Really depends on the worklload, but I could see it making the difference here.

I need to - Looks like a command prompt interface - I don't need to download anything right?  It's just built into Windows?

Any /switches you recommend for highest speeds?

I also was able to get 400-500 MB/sec speed increases by disabling indexing on the drives by UN-checking this option in properties:

image.png.33d73752f8427bf5af19509392be7fb8.png

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, DuaLeaD said:

I need to - Looks like a command prompt interface - I don't need to download anything right?  It's just built into Windows?

/MT is multithreading, might speed stuff up a good amount.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Also, can anyone speak to options that I can change to optimize our 40-Gig network cards?

I thought about just calling Mellanox directly as these are BY FAR the most advance network interface cards I have ever seen!

THREE-FREAKIN' PAGES of options....

Pardon my amateur cropping but this is what is available:

image.png.610add78f55c5d281d437985c4801999.png

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, DuaLeaD said:

Also, can anyone speak to options that I can change to optimize our 40-Gig network cards?

I thought about just calling Mellanox directly as these are BY FAR the most advance network interface cards I have ever seen!

THREE-FREAKIN' PAGES of options....

Pardon my amateur cropping but this is what is available:

Id to a quick iperf test to see what speed the link is giving you.

Link to comment
Share on other sites

Link to post
Share on other sites

On 2/22/2021 at 9:41 PM, DuaLeaD said:

I was SHOCKED....Can anyone explain these discrepancies between SSD benchmark speeds vs actual transfer speeds? Why would a file copy at only 51.7% (2,200 / sec vs 4,250 MB / sec) of its benchmarked write speeds??

What is the exact motherboard and which PCIe slots and M.2 slots are you using? If one of the NVMe SSDs (or NIC but I doubt it) is connected to the chipset and not directly to the CPU you might be saturating the interconnect between the CPU and the chipset.

 

As for benchmark speed vs actual speeds there can be very large discrepancies as a lot of benchmark software use I/O sizes and patterns that aren't real world. I would suggest using IOmeter instead and testing many different block sizes and also to tests using a single SSD and then using both. You can also use IOmeter to do tests across the network, you can also setup multiple threads/transfers.

 

SSD performance figures by manufacturers are just largely a lie, like you CAN get those speeds but you never actually will in real world usage.

Link to comment
Share on other sites

Link to post
Share on other sites

On 2/23/2021 at 12:01 PM, DuaLeaD said:

Also, can anyone speak to options that I can change to optimize our 40-Gig network cards?

Turn off Flow Control and Quality of Service, you don't need it and it'll only reduce performance in your directly connected configuration. Probably not by much if at all but you don't need it.

 

Check that every Offload option is enabled.

 

Enable Jumbo Frames, ~9000 bytes.

 

Optimize RSS to the number of cores you have, do not pick greater than what you have but increase to maximum possible.

 

Make the Receive and Send Buffers as large as you can, if not a drop down selection just put an extremely large number in to the field then press the up arrow on the value increase buttons and it'll change to the maximum supported.

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, leadeater said:

Turn off Flow Control and Quality of Service, you don't need it and it'll only reduce performance in your directly connected configuration. Probably not by much if at all but you don't need it.

 

Check that every Offload option is enabled.

 

Enable Jumbo Frames, ~9000 bytes.

 

Optimize RSS to the number of cores you have, do not pick greater than what you have but increase to maximum possible.

 

Make the Receive and Send Buffers as large as you can, if not a drop down selection just put an extremely large number in to the field then press the up arrow on the value increase buttons and it'll change to the maximum supported.


Greatly appreciate your detail and effort in replying to my issue - It's been hard finding any answers as so few people are using 40-gigabit ethernet at home.

I noticed when I turned off Flow Control, the speed fluctuated wildly and didn't see much of an increase - Disabled QoS definitely helped and I've done all the other options you suggested and still hitting about 25 Gbps.  Perhaps Windows Copy is the culprit.

Could you further explain offloads?  Is that making the NIC do more of the work instead of the CPU or have I got it backwards?  The 16-core 3950X I'm running is pretty epic so I definitely have processing power to burn on transfers.  Just wanna make sure everything is firing on all cylinders so to speak!

PS:  I noticed my speed dropped several gigabit below 25 Gbps after a Windows update but it took me several days to notice - Gotta love how it reverted all my NIC settings to default....

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, DuaLeaD said:

Is that making the NIC do more of the work instead of the CPU

Correct.

 

1 hour ago, DuaLeaD said:

PS:  I noticed my speed dropped several gigabit below 25 Gbps after a Windows update but it took me several days to notice - Gotta love how it reverted all my NIC settings to default....

Yep... yay Windows lol.

 

1 hour ago, DuaLeaD said:

Perhaps Windows Copy is the culprit.

Most likely, it's really not a high performance method of file transfers. It's really good up to a point then it hits limitations like you're seeing.

Link to comment
Share on other sites

Link to post
Share on other sites

Added a video (at the top) of a 40 GB file transfer - Not 34 Gbps but still pretty sick.... I'm gonna try RoboCopy and report back!

Link to comment
Share on other sites

Link to post
Share on other sites


image.thumb.png.dca30ffc4a871a0b753eaf7b7066a945.png

Okay, so I just did a multi-threaded copy of my 40GB and 20GB test files (together) using a 32-thread copy.
According to my calculations, that works out to a solid 25.85 Gbps.  Not quite where I want it but definitely faster than the 13-18 Gbps I am seeing with Windows file copy!!

Link to comment
Share on other sites

Link to post
Share on other sites

image.thumb.png.d7b5d799e8449c09a33d0056c3458acd.png

Just did another one with only 8 threads and hit 28.66 Gbps!!
So it seems as if there is a point where you hit diminishing returns with # of threads?
Maybe it is extra work for the CPU to divide up the transfer between so many threads?

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×