Jump to content

Transferring Files at 100 GIGABIT Per Second - HOLY $H!T

jakkuh_t
On 9/30/2018 at 4:01 PM, Spuz said:

So what exactly do I need to look for to get something like this setup? (I don't mean exactly what is in the video but just some RDMA networking). Like what components and what would connect to what etc. I'm not expecting to fork out right now, but this truly interests me so looking to get knowledge (I have literally no clue).

 

This might sound stupid and facetious, but honestly, the first thing that you need to look for if you want to run RDMA is whether your network interface card even supports RDMA.

 

If it does, then the next step is going to be it depends on what you're actually running (whether it's Infiniband here), or whether it's ethernet (which is and isn't the same), and from there, you can dig deeper into the specific implementation.

 

For me, I just bought four 100 gbps 4x EDR Infiniband cards (dual port, 4x EDR IB per port), so unlike a lot of videos where they talk a lot about RoCE, because I am going to be running it on a separate physical layer anyways (separate from my 1 gbps ethernet/RJ45 layer), so I actually don't really have a need to implement RoCE because I don't have any other ethernet traffic going through the pipe. (And this is for me "home" compute, private, separate/isolated, backbone network). (my compute nodes have a "public" (internal to home and exposed to all other ethernet network traffic) connection that I will use to manage and push/pull data onto and off the compute nodes, but the node-to-node communication is moving over to 4x EDR IB) so that I will have the lowest point-to-point/node-to-node communication latency.

 

If I were moving everything through this "new" pipe, then I might run RoCE with PFC, but being that it's a separate network, I don't really need to. (Although truth be told, given that my new cards are all dual port, what I would more than likely do in practice would be to get two separate switches and to keep the traffic separate still so that I can make use of both ports since I don't think that aggregating the links will really help me as much it would help with keeping the backbone interconnect separate from everything else.)

IB >>> ETH

Link to comment
Share on other sites

Link to post
Share on other sites

On 9/29/2018 at 2:59 PM, jakkuh_t said:

Almost a year and a half ago we checked out networking that was supposed to run at 40 gigabit, but now, we've stepped up our game into the land of triple digit gigabits.

 

 

Buy Mellanox ConnectX Cards:
On Ebay: http://geni.us/k1gzC

 

 

The thing that I am most interested in is when you first tried this in Windows 10, how did you know that the RDMA and/or RoCE wasn't working?

 

Did you just started running the copying tests as shown in the video, and you weren't getting the higher speeds or what?

 

(I didn't think that the RDMA would be dependent on the OS support as long as the drivers supported it, but maybe I'm incorrect in my mental model of that.)

 

Also, why did you change the port type from IB to ethernet when you don't have anything else running on the link given that you had a direct connection from NIC to NIC?

 

You probably could have kept the port type as Infiniband and not even bothered with ethernet, and thus RoCE because you have no other ethernet traffic through the direct connected link.

IB >>> ETH

Link to comment
Share on other sites

Link to post
Share on other sites

(Sidenote: Thank you for this video. My biggest takeaway from your video was the fact that you can find the hardware now on eBay vs. paying $660 for the cheapest, dual port, 4x EDR IB Mellanox ConnectX-4 card from places like Colfax Direct. So thank you for that. And also thank you for reminding me that I don't necessarily NEED a $20,000 36-port 4x EDR IB switch in order to make it work and that I can just as easily and just as well, make two computers "talk" to each other via a direct connection.)

 

So this was actually very useful.

 

And because of this video, I've already put in my order for four dual port, 4x EDR IB Mellanox ConnectX4 cards, which at a total bandwidth of 200 gbps (2x 100 gbps ports), I'm honestly not entirely sure what I'm going to do with all of that bandwidth.

 

(My first tests are all going to be about latency because I actually don't put a heck of a lot of traffic through it, but also maybe now that I am moving from a gigabit ethernet network as my node-to-node backbone/interconnect, I might be able to run the same problems, but faster because of this, which WILL have higher volume of data transfers. Previously, with only a GbE as the backbone/interconnect, I would purposely select solution methods that would minimize network transfers because having tested methods that involve a LOT more network transfers, running it with more processors, more slave processes, and more CPU cores didn't necessarily solve the problem any faster than if I just kept it on one node. Now I might be able to make better use of running multiple nodes for a given problem.)

 

The cables are also ordered and also the low profile mounting brackets as well (because it was cheaper to get the card with the full height mounting bracket and then swapping out the brackets separately).

 

Two of my systems will be running linux (specifically SLES 12 SP1 because that's what's supported by the software) and two of my systems will be running Windows (version pending testing).

 

(And like I said before, I have ran into problems before where the default MPI for the solver failed to produce a solution, but if I change it to a different MPI (from Intel MPI to MSMPI), I was able to get a solution with MSMPI where and when Intel MPI failed to do the same.)

 

Eventually, I might move all of my current systems from GbE to 4x EDR IB, but for now, it'll stay this way (only because I really don't have that great of a need for anything faster (really) than 10GE. (I hate that I am mixing uppercase 'G' and lowercase 'g'! Grrr. Anyways...)

 

(This was prompted because I measured my node-to-node network latency and compared to a 500 ns port-to-port latency with 4x EDR IB compared to sub 200000 ns (200 ms) latencies with GbE, I knew that I had to significantly and substantially reduce my latency if I wanted to be able to run my stuff faster across my four compute nodes.)

 

So thank you, for this video, which helped lead me down this road.

 

P.S. If anybody is interested in my latency test results when I run it, please let me know.

 

P.S. #2 I intend on keeping and running IB through and through and I am NOT intending on switching the port types over from IB to Ethernet unless I absolutely have to. (By default, IB is a RDMA PHY protocol/interface, and again, given that the backbone/interconnect ISN'T going to have any other ethernet traffic through it (and even if it did, it would be separated by the physical layer by way of a separate switch), so I don't have to run RoCE and therefore; also don't have to run PFC either.

 

So, my configuration should be vastly simpler than many others. (Course, I am not really sure WHY you would even run ethernet over IB, but, that's besides the point. (Yes, I know what it's for, but if it were me, I would elect to keep the IB layer separate from the ETH layer. But that's just me. And yes, I also realize that that's additional cost, to who knows. (I would imagine that when you're getting up to the 100 GbE speeds anyways, it's not like each cube is getting their own 100 GbE direct connection to the server, so I'm not entirely sure why that matters and/or why you can't then move ethernet data over onto IB as you start aggregating up. *shrug* Oh well.

 

Again, doesn't matter in my use case anyways, cuz I would do it this way (because the QSFP28 is physically different than my RJ45 anyways, so I am literally physically separated). And I'm not intending on running several hundred metres of fiber throughout my house/rewire my house when GbE has been plenty sufficient. 10GbE would be nice, if I can run it over RJ45, but totally unnecessary.

 

The IB, ironically though, would be something that I can seriously make use of, NOT because of the 100 gigabit bandwidth, but because of the 500 ns latency.

IB >>> ETH

Link to comment
Share on other sites

Link to post
Share on other sites

  • 2 months later...

This is what I wrote in the comments in the "Linus Replies to Mean Comments" video:

 

I'll be honest - I didn't particularly like the style or the content at first. But I'll say this - BECAUSE Linus did a video about 10 GB/s transfer speeds, that LITERALLY was the reason and the catalyst that gave me the idea to get my own Mellanox ConnectX-4 cards, cables, and also a 36-port 100 Gbps Infiniband switch as well.

 
So if it hadn't been for Linus, I probably would have deferred it because I thought that I would have to buy all that stuff new rather than used at a FRACTION of the cost.
 
Therefore; because of that, I am actually VERY grateful to Linus for having done that video because it has now enabled me to do what I do MUCH faster and also SIGNIFICANTLY more efficiently.
 
So if there is ever any wonder about whether or not his videos has a real impact - I can say, from my own personal experience, it ABSOLUTELY does.
 
I went from running gigabit ethernet to now 4x EDR Infiniband, which is a 100 Gbps interconnect -- all because of Linus. So thank you.
 
From one Canadian to another! Eh! :)
 
(P.S. I feel like I owe you and your production team Timmy's or something.)

IB >>> ETH

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 month later...
Hello,

I'm trying to learn about 100g networking. I was thinking about making a direct connect fiber between network cards on different computers - no switch - just direct.

I found a fiber fanout that can go from one 100g to two 50g mtp/mpo connect -

start2.JPG.def269323b218d994a7cb147918d07be.JPG
I've never used anything over 10g RJ45 but want to play around a bit with transceivers and copper to really learn about it. Can a 100g transceiver such as this do 50g?
http://www.10gtek.com/AMQ28-SRchoose files... Click to choose files 4-M1-292.html#z2

I also don't know if you need different types of transceivers for 24 or 12 pin.
 
So basically I'm trying to achieve this without a switch but I don't know what is possible.
 
 
start.jpg.5ad7f72a55b1a0312ca9363e4205534f.jpg
start3.JPG.f4ae96695aee4e3bdad413565edf461a.JPG
 
 
Link to comment
Share on other sites

Link to post
Share on other sites

So I found out from the transceiver manufacturer that a 100 split to two links will work as long as each of the links is 2x25 and not 1x50 - so I'll check with the fiber manufacturer to verify the configuration.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 4 weeks later...
On 4/10/2019 at 8:03 AM, boe said:

So I found out from the transceiver manufacturer that a 100 split to two links will work as long as each of the links is 2x25 and not 1x50 - so I'll check with the fiber manufacturer to verify the configuration.

Depending on the distances that you are looking to cover, you might not need fiber cables unless you really want to spend that kind of money on them as fiber optic cables (passive, and ESPECIALLY active optical cables, cost significantly more than copper direct attach cables).

 

That being said, if you're using a card like the Mellanox ConnectX-4 VPI dual port or EN dual port, you can connect up to three computers at any given point in time.

 

HOWEVER, having said that, depending on your configuration, there is a real possibilty that the first computer and your third and LAST computer may NOT be able to communicate with each other (if you aren't running a switch).

 

So long as you're okay with that, this will work.

 

But if you also need your third computer to be able to talk to your first computer as well, then unfortunately, that *won't* work.

 

This is part of the reason why I ended up having to buy a switch because of this, very specific, reason.

 

(On the flip side though, now that I have a 36-port 100 Gbps 4x EDR Infiniband switch, I am starting to look at putting all of my systems on it, to my maximum extent possible, just so that I can take advantage of it.)

IB >>> ETH

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×