Jump to content

Ultra Ethernet Consoritum - Aiming for Million Node Clusters

Lurick

Summary

You've see the giga-, you know about the tera-, but are you ready for ULTRA Ethernet?!?!

With Ethernet passing the 50 year mark recently a new Ethernet consortium looks at the future to challenge InfiniBand network scalability with new standards for future networking to handle increased complexity and scale. Members of the new consortium include AMD, Arista, Broadcom, Cisco, Eviden, HPE, Intel, Meta and Microsoft.

 

Quotes

Quote

The Ultra Ethernet Consortium is being hosted at the Linux Foundation, which is about as neutral as you can get in this world, and the founding companies are donating intellectual property and personnel to create a unified Ultra Ethernet standard that they can all eventually hew to with their future products. You can read all of the background on the Ultra Ethernet effort in this position paper, but it all boils down to this: InfiniBand is essentially controlled by a single vendor, and the hyperscalers and cloud builders hate that, and it is not Ethernet, and they hate that, too. They want one protocol with many options in terms of functionality, scale, and price.

 

One of the key features of the emerging Ultra Ethernet standards is the packet spraying technique for multipathing and congestion avoidance that Broadcom and Cisco have in their respective Jericho3-AI and G200 ASICs. They also want to add flexible packet ordering to the Ethernet standard, which helps the All-Reduce and All-to-All collective operations commonly used in AI and HPC applications to run better than they can when strict packet ordering is enforced.

 

The Ultra Ethernet standard will also address new congestion control methods that are optimized for AI and HPC workloads (and far less brittle than methods that have been developed for Ethernet fabrics supporting web and database applications running at scale). This congestion control requires end-to-end fabric telemetry, which many switch ASIC makers and switch makers have been trying to graft onto existing ASICs. They want it built in and standardized, but with enough room for vendors to create their own implementations for differentiation.

 

My thoughts

With Ethernet passing the 50 year mark it's surprising how much and yet how little has really changed in the space over the years. I'm excited to see a possible new standard integrated into upcoming networks as we approach 800Gb/s and 1.6Tb/s networks in the near future with 800GbE gear already rolling out in some form this year or early next. As demand for greater density and faster speeds converges this should produce some interesting clashes and hopefully new and exciting standards and technologies in the near future especially around interconnects for high speed GPUs.

 

Sources

https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/

https://www.networkworld.com/article/3703188/cisco-arista-hpe-intel-lead-consortium-to-supersize-ethernet-for-ai-infrastructures.html

Current Network Layout:

Current Build Log/PC:

Prior Build Log/PC:

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, Lurick said:

but it all boils down to this: InfiniBand is essentially controlled by a single vendor, and the hyperscalers and cloud builders hate that, and it is not Ethernet, and they hate that, too. They want one protocol with many options in terms of functionality, scale, and price.

NetApp dropped IB for cluster interconnects for this exact reason. Their requirement is dual source minimum for everything for supply chain and development security and IB violated that. A lot of people are super unhappy about IB being single vendor.

 

Also the owner of IB has a really bad rep so....

Link to comment
Share on other sites

Link to post
Share on other sites

As the grumpy ol' John Dvorak once wrote:

All EtherNOT technologies are doomed to fail.

 

He wrote that more than a decade or so ago, and he's been right all along.  IB's got a great foothold because of its insane low latency, which is fantastic specifically for HPC workloads.  But there are tricks that can be done with Ethernet to rapidly approach IB's low-latent links.

Editing Rig: Mac Pro 7,1

System Specs: 3.2GHz 16-core Xeon | 96GB ECC DDR4 | AMD Radeon Pro W6800X Duo | Lots of SSD and NVMe storage |

Audio: Universal Audio Apollo Thunderbolt-3 Interface |

Displays: 3 x LG 32UL950-W displays |

 

Gaming Rig: PC

System Specs:  Asus ROG Crosshair X670E Extreme | AMD 7800X3D | 64GB G.Skill Trident Z5 NEO 6000MHz RAM | NVidia 4090 FE card (OC'd) | Corsair AX1500i power supply | CaseLabs Magnum THW10 case (RIP CaseLabs ) |

Audio:  Sound Blaster AE-9 card | Mackie DL32R Mixer | Sennheiser HDV820 amp | Sennheiser HD820 phones | Rode Broadcaster mic |

Display: Asus PG32UQX 4K/144Hz displayBenQ EW3280U display

Cooling:  2 x EK 140 Revo D5 Pump/Res | EK Quantum Magnitude CPU block | EK 4090FE waterblock | AlphaCool 480mm x 60mm rad | AlphaCool 560mm x 60mm rad | 13 x Noctua 120mm fans | 8 x Noctua 140mm fans | 2 x Aquaero 6XT fan controllers |

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, leadeater said:

Also the owner of IB has a really bad rep so...

 

This is bait, but: who owns IB again?

 

Editing Rig: Mac Pro 7,1

System Specs: 3.2GHz 16-core Xeon | 96GB ECC DDR4 | AMD Radeon Pro W6800X Duo | Lots of SSD and NVMe storage |

Audio: Universal Audio Apollo Thunderbolt-3 Interface |

Displays: 3 x LG 32UL950-W displays |

 

Gaming Rig: PC

System Specs:  Asus ROG Crosshair X670E Extreme | AMD 7800X3D | 64GB G.Skill Trident Z5 NEO 6000MHz RAM | NVidia 4090 FE card (OC'd) | Corsair AX1500i power supply | CaseLabs Magnum THW10 case (RIP CaseLabs ) |

Audio:  Sound Blaster AE-9 card | Mackie DL32R Mixer | Sennheiser HDV820 amp | Sennheiser HD820 phones | Rode Broadcaster mic |

Display: Asus PG32UQX 4K/144Hz displayBenQ EW3280U display

Cooling:  2 x EK 140 Revo D5 Pump/Res | EK Quantum Magnitude CPU block | EK 4090FE waterblock | AlphaCool 480mm x 60mm rad | AlphaCool 560mm x 60mm rad | 13 x Noctua 120mm fans | 8 x Noctua 140mm fans | 2 x Aquaero 6XT fan controllers |

Link to comment
Share on other sites

Link to post
Share on other sites

37 minutes ago, jasonvp said:

This is bait, but: who owns IB again?

A green company that's very big on AI/HPC who brought out one of the most widely used Ethernet and only InfiniBand vendors in the industry. There is a reason AMD and Intel have vested interest in moving things away from InfiniBand but to be honest it was happening well before this. Many of the big Top 500 not using custom fancy interconnects are using 25Gb/100Gb and higher Ethernet.

 

Personally I have not done a whole lot with InfiniBand but some of the more attractive features of the standard I believe are that it is considered lossless, flow control is inbuilt and packet reordering is not a problem, as well as switches not needing large buffers and processing. Ethernet has tried to bring in lossless feature capabilities but I think that hasn't really been unified properly across all vendors and flow control is very different between it and InfiniBand.

 

Realistically the biggest problem with InfiniBand is it being single vendor followed by the owner of it having a strong focus on developing technology and standards suited for itself, which isn't wrong but they also aren't a small low impact company either. Think of it like Intel OmniPath, developed by Intel for Intel ecosystem with no existing industry usage since it was new only created a new market option. InfiniBand on the other hand is an open standard with few hardware vendors who kept dropping out until there was only one which got brought out by a private company. InfiniBand lives or dies based on whether they continue to stick to the open standard and contribute to it or take it closed and proprietary in any way either completely or optional features that realistically become not optional.

 

P.S. The situation is a different "Blue" company's fault for buying out QLogic first resulting in Mellanox being the only one left. OmniPath failed, it got sold off and now I think they are living to regret that decision and what it has lead to. In my very naïve opinion the QLogic buyout should have been blocked.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, leadeater said:

A green company that's very big on AI/HPC who brought out one of the most widely used Ethernet and only InfiniBand vendors in the industry.

 

Yes.  Bait.  And they don't have a "bad reputation"; not sure where you're getting that unless you're specifically referring to certain GPUs.  BTW: the problem isn't actually affecting their sales, so... so much for said reputation?

 

In the networking world, the purchase of Mellanox in combination with Cumulus is making for a massively powerful SmartNIC product of the likes the industry hasn't seen before.  Look to the Bluefield-2 and Bluefield-3 DPUs.  They're.... actually fucking awesome.  And only really possibly given Team Green's oversight.

Editing Rig: Mac Pro 7,1

System Specs: 3.2GHz 16-core Xeon | 96GB ECC DDR4 | AMD Radeon Pro W6800X Duo | Lots of SSD and NVMe storage |

Audio: Universal Audio Apollo Thunderbolt-3 Interface |

Displays: 3 x LG 32UL950-W displays |

 

Gaming Rig: PC

System Specs:  Asus ROG Crosshair X670E Extreme | AMD 7800X3D | 64GB G.Skill Trident Z5 NEO 6000MHz RAM | NVidia 4090 FE card (OC'd) | Corsair AX1500i power supply | CaseLabs Magnum THW10 case (RIP CaseLabs ) |

Audio:  Sound Blaster AE-9 card | Mackie DL32R Mixer | Sennheiser HDV820 amp | Sennheiser HD820 phones | Rode Broadcaster mic |

Display: Asus PG32UQX 4K/144Hz displayBenQ EW3280U display

Cooling:  2 x EK 140 Revo D5 Pump/Res | EK Quantum Magnitude CPU block | EK 4090FE waterblock | AlphaCool 480mm x 60mm rad | AlphaCool 560mm x 60mm rad | 13 x Noctua 120mm fans | 8 x Noctua 140mm fans | 2 x Aquaero 6XT fan controllers |

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, jasonvp said:

 

Yes.  Bait.  And they don't have a "bad reputation"; not sure where you're getting that unless you're specifically referring to certain GPUs.  BTW: the problem isn't actually affecting their sales, so... so much for said reputation?

 

In the networking world, the purchase of Mellanox in combination with Cumulus is making for a massively powerful SmartNIC product of the likes the industry hasn't seen before.  Look to the Bluefield-2 and Bluefield-3 DPUs.  They're.... actually fucking awesome.  And only really possibly given Team Green's oversight.

Nvidia have a horrible reputation! Especially in the commercial space. No one likes the fact they have to work with Nvidia, but they do because for many situations they are the only logical, and often, the only option at all 

My Folding Stats - Join the fight against COVID-19 with FOLDING! - If someone has helped you out on the forum don't forget to give them a reaction to say thank you!

 

The only true wisdom is in knowing you know nothing. - Socrates
 

Please put as much effort into your question as you expect me to put into answering it. 

 

  • CPU
    Ryzen 9 5950X
  • Motherboard
    Gigabyte Aorus GA-AX370-GAMING 5
  • RAM
    32GB DDR4 3200
  • GPU
    Inno3D 4070 Ti
  • Case
    Cooler Master - MasterCase H500P
  • Storage
    Western Digital Black 250GB, Seagate BarraCuda 1TB x2
  • PSU
    EVGA Supernova 1000w 
  • Display(s)
    Lenovo L29w-30 29 Inch UltraWide Full HD, BenQ - XL2430(portrait), Dell P2311Hb(portrait)
  • Cooling
    MasterLiquid Lite 240
Link to comment
Share on other sites

Link to post
Share on other sites

18 hours ago, jasonvp said:

Yes.  Bait.  And they don't have a "bad reputation"

Except they do. You want to use their highest end products with the best performance then you must use their proprietary SXM standard. Oh but not just that you MUST use their own designed and made board module and nothing else, ever wondered why all the 4x and 8x solutions internally are the same from every single server vendor? It's not by choice.

 

Have you ever purchased GPUs under their academic programs with discount rates like yoyos? 

 

They have a good reputation for having good products, they don't have one as a company.

 

18 hours ago, jasonvp said:

In the networking world, the purchase of Mellanox in combination with Cumulus is making for a massively powerful SmartNIC product of the likes the industry hasn't seen before.  Look to the Bluefield-2 and Bluefield-3 DPUs.  They're.... actually fucking awesome.  And only really possibly given Team Green's oversight.

Other ones exist and are better like Pensando. You also have Fungible, Stingray, Octeon and others. Bluefield is not at all how you put it nor even the first market option for it. 

 

I also think you are probably greatly over estimating the market interest in Bluefield since I did not mention any of the hyperscaler products above which have their own designed and made for themselves SmartNICs, also Azure offers Pensando customer service offerings and not Bluefield. The only reason Bluefield will get used by them is if it becomes not optional which btw is a fair chance of happening since it's quite likely NVLink is going to be extended out from internal to system only to inter-node interconnect and I have my doubts that'll work over any lower transport layer other than in some reduced capacity or legacy mode. 

 

NVLink has a good chance to be extended out in to Bluefield then used between Bluefield DPUs for inter server connectivity, that is both good and bad. Bad in that Bluefield will be the only option for that. Technology wise something like this would obviously be awesome though.

 

If you are a VMware customer, like where I am, you'll be choosing Pensando over Bluefield. If you want to know why then STH has big write ups on both and also covered why Pensando is the choice there over Bluefield. STH and Blocks & Files both point out that Pensando has more capabilities and performance with the ability to accelerate more functions than Bluefield.

 

As for tight integration then you can choose network switches that are Pensando accelerated like Aruba CX series.

Link to comment
Share on other sites

Link to post
Share on other sites

Sounds some endgame for AI or something.

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×