Jump to content

Transferring Files at 100 GIGABIT Per Second - HOLY $H!T

jakkuh_t

Almost a year and a half ago we checked out networking that was supposed to run at 40 gigabit, but now, we've stepped up our game into the land of triple digit gigabits.

 

 

Buy Mellanox ConnectX Cards:
On Ebay: http://geni.us/k1gzC

 

widget.png?style=banner2

PC: 13900K, 32GB Trident Z5, AORUS 7900 XTX, 2TB SN850X, 1TB MP600, Win 11

NAS: Xeon W-2195, 64GB ECC, 180TB Storage, 1660 Ti, TrueNAS Scale

Link to comment
Share on other sites

Link to post
Share on other sites

I love it when new technology like this comes out, because prices of 10Gigabit server pulls on eBay should start to come down. ;)

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

I'm going to watch this video when I get done paying my 70$ 20mb internet bill! Might even tick the 4k option if spectrum is feeling up to it

Link to comment
Share on other sites

Link to post
Share on other sites

Sorry guys but from a technician who codes fiber optics for a manufacture daily i need to correct a few things:

 

1. QSFP+ is 40G, if it's 100G it's QSFP28. (explanation below)

2. (okay a bit picky one) you can get an SFP/QSFP/etc. converter directly to RJ45, as long as the port is run in Ethernet mode.

3. running any longer runs than 5 meters is on a budget is way less expensive than 2400$, let me explain:

  • AOC is a niece market therefore high pricing, normal installation for over 5 meters people us fiber optics.
  • for example let's say you guys planned to do 10/40/100G over fiber in the whole office, you would most likely use SR optics (short range, often rated at op too 300 meters). for QSFP28 aka. 100G they each cost 100-150 usd at a third party manufacture. and then a fiber optic cable here lets say an MTP 20M cable which is 30-60 usd depending on where you get them. so a little above 250 usd if you bought the cheapest, would recommend going to another seller than the cheapest and get better quality but that's besides the point.

 

*SFP is 1,25G and SFP+ is 10G and SFP28 is 25G. QSFP+ is VERY roughly speaking 4 SFP+ strapped together therefor creating 40G, QSFP28 is then funny enough Quad SFP28 therefore 100G.

 

btw if you guys ever need help sourcing any high end fiber optics or passive equipment for fiber networking, hit me up i might be able to give you a good deal ;) 

 

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, mylenberg said:

-snip-

Don't forget about QSFP-DD or OSFP :P

 

A lot of customers these days seem to be migrating away from SR and over to LR for all their runs, even the short runs, since (iirc) they don't have to worry about OM3, OM4 rated cables and the different distance limitations each one can mean for higher speeds since for some optics they will provide lower distances with OM3 vs OM4 cables at the same speed which can be annoying to deal with.

Current Network Layout:

Current Build Log/PC:

Prior Build Log/PC:

Link to comment
Share on other sites

Link to post
Share on other sites

56 minutes ago, Alex Atkin UK said:

I love it when new technology like this comes out, because prices of 10Gigabit server pulls on eBay should start to come down. ;)

You'd likely be surprised to learn this actually isn't.... new. Like 10GbE and IB, it's been around a while. It's used primarily in computing clusters, though. Right now they're looking at TbE. Yes, 1TB/s Ethernet.

Wife's build: Amethyst - Ryzen 9 3900X, 32GB G.Skill Ripjaws V DDR4-3200, ASUS Prime X570-P, EVGA RTX 3080 FTW3 12GB, Corsair Obsidian 750D, Corsair RM1000 (yellow label)

My build: Mira - Ryzen 7 3700X, 32GB EVGA DDR4-3200, ASUS Prime X470-PRO, EVGA RTX 3070 XC3, beQuiet Dark Base 900, EVGA 1000 G6

Link to comment
Share on other sites

Link to post
Share on other sites

I've setup the 100G over Fabric Showcase on thursday for the Amsterdam OCP event next week.

Yes that's crazy (from a normal standpoint) stuff.

It's running a AIC FB127-LX with 3x100G + MZ4LB3T8HALS-00003 (Samsing 3.8TB M.3 NVMe SSDs), a Mellanox MSN2700-CS2F (32x100G) Switch and 4 Compute nods with 100G each.

But Linus, for your "Office use" you should only should get connectx-3 with a MSX1012B-2BRS Switch, it's more cost effective at the moment.

For conclusion some Hardware p*rn:

5bafddd28c93c_e8-100GBEoverFabric-050.thumb.jpg.786246ac736a8de47831627c6147825d.jpg

Link to comment
Share on other sites

Link to post
Share on other sites

Great vid..and it once again bringing up my curiosity about HPC(high performance computing)

Can I suggest a video about build a home super computer by connecting some identical or different computers to form one single super computer with 100Gbps connections.

I heard may be this can be done using Windows HPC Server..and we may have something like a 256 or 512 cores system and try it out with some multi-threaded applications may be?

 

Don't know if it's doable but surely want to see something like this!

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, kedstar99 said:

There is a reason why so many algorithmic, financial, CDNs, clouds and OEMs use solarflare NICs.

But the CDN and finacial customers in my dc my run solarflare inside their servers, but in the core they still use Juniper MX/QFX & Cienna DMDM Systems to uplink to the DE-CIX.

Most financial customers and even fintec startups still rely on 10G or even 1GbE and their uplink didn't see 1G yet, even if the carrier could deliver...

Link to comment
Share on other sites

Link to post
Share on other sites

Consumers have really been forgotten by the networking hardware companies. 1GbE should have been replaced 10 years ago by 10GbE, and in the 2018-2019 time frame, we really should be at the 50+ gigabit range for home networks.

 

1GbE is simply preposterously slow to be using in 2018, it is like trying to sell the Nvidia Geforce 256 to modern gamers today.

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, Razor512 said:

Consumers have really been forgotten by the networking hardware companies. 1GbE should have been replaced 10 years ago by 10GbE, and in the 2018-2019 time frame, we really should be at the 50+ gigabit range for home networks.

 

1GbE is simply preposterously slow to be using in 2018, it is like trying to sell the Nvidia Geforce 256 to modern gamers today.

There isn't consumer demand for it. That's why we don't have anything faster than 1GbE in most home network setups. It just isn't needed. It'd be a different story if home Internet access bandwidth was in excess of 1Gb everywhere, but that isn't the case. And having multiple computers connected to an Internet connection is generally the only reason most home networks exist. And most of that is wireless.

Wife's build: Amethyst - Ryzen 9 3900X, 32GB G.Skill Ripjaws V DDR4-3200, ASUS Prime X570-P, EVGA RTX 3080 FTW3 12GB, Corsair Obsidian 750D, Corsair RM1000 (yellow label)

My build: Mira - Ryzen 7 3700X, 32GB EVGA DDR4-3200, ASUS Prime X470-PRO, EVGA RTX 3070 XC3, beQuiet Dark Base 900, EVGA 1000 G6

Link to comment
Share on other sites

Link to post
Share on other sites

50 minutes ago, Razor512 said:

Consumers have really been forgotten by the networking hardware companies. 1GbE should have been replaced 10 years ago by 10GbE, and in the 2018-2019 time frame, we really should be at the 50+ gigabit range for home networks.

 

1GbE is simply preposterously slow to be using in 2018, it is like trying to sell the Nvidia Geforce 256 to modern gamers today.

Its nothing new though, motherboard NICs were 100Mbit for WAY longer than they should have and even in Gigabit land they still slap crappy Realtek chipsets on the majority of motherboards stealing nearly 100Mbit off your peak speeds anyway.

That said, I still think its a minority of home users who need Gigabit let alone 10Gig.  Just read the forum here, most people still using WiFi and plenty still using 2.4Ghz.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

Those limitations stifle innovation, imagine if motherboard and router makers started giving at least 10GbE, or probably 5 10GbE ports and 1 40GbE port for a home server, we would see rapid innovation in technologies targeting consumers.

 

For example, if ISPs restricted consumers to 56k dialup, we would have never had netflix.

 

If home networks were limited to 3Mbps, we would likely have never seen network attached storage devices being made and sold to consumers.

 

R&D typically does not take place when everything lacks the ability to enjoy the fruits of that R&D.

Link to comment
Share on other sites

Link to post
Share on other sites

I don't think it typically works like that, because R&D is thanks to the insane profit margin of high-end kit.  It only comes down to us once the people willing to spend insane-o-money have moved on to something better.

 

Although I do see 10Gig cards on eBay for relatively cheap right now, so its probably the time to roll it out, especially now we have the cores to handle it on desktop parts.

Some Threadripper boards do in fact have 10Gig as those are the consumers likely to be able to take advantage of it.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

So what exactly do I need to look for to get something like this setup? (I don't mean exactly what is in the video but just some RDMA networking). Like what components and what would connect to what etc. I'm not expecting to fork out right now, but this truly interests me so looking to get knowledge (I have literally no clue).

Link to comment
Share on other sites

Link to post
Share on other sites

What's up with the very shaky camera movements throughout this video?

Intel Core i5 4690K 4.2GHz | ASRock Z97 Extreme4 | Kingston HyperX 16GB DDR3 | EVGA 960 4GB SSC SLI | Samsung 850 EVO 500GB | EVGA SuperNOVA NEX 750B | Corsair Hydro Series H50

Link to comment
Share on other sites

Link to post
Share on other sites

On 9/29/2018 at 3:23 PM, kedstar99 said:

As someone well versed in this tech sector, all I can stay is god damn you guys are dumb.

 

You guys went with Mellanox, a place with some seriously shitty driver, cable and support. You should see how many CRCs we get on Mellanox switches and cards at work. The jitter on those pieces of shit are insane. Should have gone for a solarflare NIC for that kind of thing.

 

Second, running this on Windows? Bitch please, that kind of latency on their shitty tech stack is ridiculously bad. You should be using Linux and onload for that kind of thing.

 

Then no SPDK or NVMEoF? That is the future, not this infiniband and other crap. Almost as low latency as native storage with that tech. That kind of thing is really new though, but is getting mainstreamed in the Linux kernel.

 

There is a reason why so many algorithmic, financial, CDNs, clouds and OEMs use solarflare NICs.

I completely agree with your statement about running this on Linux instead of Windows. RedHat has had this since I believe 2013 at least? https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/ch-configure_infiniband_and_rdma_networks

 

And even if Linus went with Mellanox, the least he could have done is run it on their Linux Distro OpenFabrics Enterprise Distribution with built in RDMA.

Or, he could have taken example from Mellanox's Linux Switch which runs InfiniBand as well.

I know Linus is just serving the masses with running Windows and such but when he gets into the technical stuff like this, he really should switch to the platform using it by and large. If he has to, bring Wendell in to demo it off. I am sure no one would blame him and Wendell really brings all the "techs to the yard" as it were ^_^.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Spuz said:

So what exactly do I need to look for to get something like this setup? (I don't mean exactly what is in the video but just some RDMA networking). Like what components and what would connect to what etc. I'm not expecting to fork out right now, but this truly interests me so looking to get knowledge (I have literally no clue).

a Linux server distro for one. (usually $0). Then get a used 100GbE card, the inifiniBand cable and a switch that can handle it.

Link to comment
Share on other sites

Link to post
Share on other sites

I have a suggestion, try the same tests again, but on Linux. If I remember correctly, Linux uses the hardware differently than windows where I think it has more direct access to hardware over Windows where it has to pass through a few layers before things get accessed. So if that's not to much trouble, I'd love to see you guys run this test again under Linux Ubuntu. 

AMD Ryzen 7 2700 3.2Ghz Pinnacle Ridge | Asus Prime X570-Pro | Corsair Vengeances RGB PRO 64GB 3200Mhz | EVGA Nvidia Geforce 3060 XC | EVGA G2 SuperNova 750 Watt PSU

Link to comment
Share on other sites

Link to post
Share on other sites

Hey can you test them in a scenario when you have multiple Windows roaming profiles stored on one server and each profile is over 80gb in size to see how long it takes the profiles to load if they all startup at once? 

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 month later...

I think I should find somewhere, but didnt, a complete list of hardware used. Can you point me to it?

Link to comment
Share on other sites

Link to post
Share on other sites

  • 2 weeks later...
On 9/29/2018 at 4:23 PM, kedstar99 said:

As someone well versed in this tech sector, all I can stay is god damn you guys are dumb.

 

You guys went with Mellanox, a place with some seriously shitty driver, cable and support. You should see how many CRCs we get on Mellanox switches and cards at work. The jitter on those pieces of shit are insane. Should have gone for a solarflare NIC for that kind of thing.

 

Second, running this on Windows? Bitch please, that kind of latency on their shitty tech stack is ridiculously bad. You should be using Linux and onload for that kind of thing.

 

Then no SPDK or NVMEoF? That is the future, not this infiniband and other crap. Almost as low latency as native storage with that tech. That kind of thing is really new though, but is getting mainstreamed in the Linux kernel.

 

There is a reason why so many algorithmic, financial, CDNs, clouds and OEMs use solarflare NICs.

 

You've watched some of his other videos, right?

 

In regards to NVMeoF, see this: https://storpool.com/blog/demystifying-what-is-nvmeof.

 

For storage use cases (where you are looking to move a large volume of data as fast as you possibly can), that would apply.

 

But if you are trying to access data with the lowest latency as possible, then NVMeoF may or may NOT necessarily apply. (Moving very little amount of data, relatively speaking, as quickly as you possibly can means that you're latency limited, not bandwidth limited. (And yes, you can argue that the two are somewhat related, but you can have very short MPI messages that are being passed through that won't ever fill the 100 gigabit/second bandwidth capacity that the line can support because what you're after is the latency rather than sheer volume through the pipe.)) And in the case where you're trying to get MPI messages across a pair of hosts as quickly as possible, the total bandwidth available is largely irrelevant until you have a significant number of hosts (and/or processes) talking to each other at the same time.

 

So that would really depend on what you're using it for.

 

The only 100GbE NIC that I could find from SolarFlare is their XtremeScale X2541, which is only single port. And I couldn't even find pricing on that via Google which means that I would have to contact SolarFlare for price and availability.

 

This is in contrast to the fact that with the Mellanox NICs, you can find them a dual port, 100 Gbps card, for as low as $300 each (because I just bought four for $270 each), dual port, 100 Gbps.

(Colfax Direct carries their older, and slower SolarFlare X2522 dual 25 GbE cards. With the onload license and ULL firmware, each card is $935. (Source: https://colfaxdirect.com/store/pc/viewPrd.asp?idproduct=3390&idcategory=6))

 

Windows is certainly a heckuvalot easier to use if you're only going to really have them temporarily (or if there is some other reason, like the fact that not all solvers support other MPI software out-of-the-box, and so sometimes, I actually DO need to use MSMPI on Windows because IntelMPI fails to solve the same problem/given the same input deck. Ansys Forte, for example, will ONLY run in Linux with Intel MPI.)

 

"That is the future, not this infiniband and other crap."

 

Sorry, but BWAHAHAHA.....

 

Isn't it true that IB is automatically RDMA? And isn't it also true that NVMeoF is just a specific implementation of a proprietary protocol (again, per the storpool link, just iSCSI on steroids?) that runs on top of RDMA PHY layer? In other words, the NVMeoF protocol, published by NVM Express, Inc., to the best of my knowledge, doesn't define the PHY layer the way that IB (published by IBTA) does. Please correct me if I am wrong in regards to this, but I couldn't find any documentation about the PHY layer in the NVMeoF specification. (In fact, in the NVM Express over Fabric 1.0 Specification, NVM Express, Inc. actually writes that NVM Express over Fabric is "an abstract protocol layer independent of any physical interconnect properties." (NVM Express over Fabric 1.0, Sec. 1.5.1, p. 8).

 

Therefore; wouldn't that make the statement "this is the future, not this infiniband and other crap" completely meaningless because without IB or RoCE or iWARP, NVMeoF does not and CANNOT work without the physical interconnect layer that the transport/interconnect layer defines/uses. (Source: ibid.)

 

So much for someone who's "well versed in this tech sector".

 

Boy, it's amazing what happens when you skim through the 49 pages of the NVMeoF spec. (literally, figure 1, on page 8, titled "taxonomy of transports" shows that "message and memory" examples are RDMA (Infiniband, RoCE, iWARP).

 

You would have thought that as someone who is "well versed in this tech sector" would have actually READ (or at least skimmed, as I have) the source specification document so as to avoid such obvious and embarrassing egregious errors like this.

 

But hey. What do I know? I'm just a n00b here. ;) Cheers!

IB >>> ETH

Link to comment
Share on other sites

Link to post
Share on other sites

On 9/29/2018 at 4:33 PM, haku_wang said:

Great vid..and it once again bringing up my curiosity about HPC(high performance computing)

Can I suggest a video about build a home super computer by connecting some identical or different computers to form one single super computer with 100Gbps connections.

I heard may be this can be done using Windows HPC Server..and we may have something like a 256 or 512 cores system and try it out with some multi-threaded applications may be?

 

Don't know if it's doable but surely want to see something like this!

 

It's actually surprisingly easy.

 

You CAN link up heterogenous hardware to create a cluster and then so long as your application supports it, it's really easy to set up and use. (The key being that the application MUST support it, which, to be honest, not very many do).

 

If you want to get started in it, my suggestion first and foremost would be to do some research into the programs that you would like to see, and see if they support performing tasks over a network with say a "head" and "slave" node(s).

 

If it does, then usually they should have some instructions on how to get it set up. If not, then unfortunately, the answer is no, so it won't matter if you have a cluster running.

IB >>> ETH

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×