Jump to content

HP Enterprise to Acquire Cray Inc for $1.3B

11 hours ago, leadeater said:

That's not a problem anymore, RDMA NICs match Infiniband latency and most high end adapters are CNAs now days anyway. Infiniband performance share and deployment share in the top 500 is getting rather small now, decreasing every year significantly. Most are deploying 25G, 40G and 100G Ethernet with a large contingent of 10G already existing (likely using first wave of RDMA).

and expertise to make any new technology industry ready.

RMDA is nice as that helps remove the bottleneck at system nodes but there is still the issue of Ethernet switches in between the nodes.  In particular I will cite this paper (PDF) which puts RMDA Ethernet switching at 100 to 300 ns higher than Infiniband.  For a single switch that isn't bad but when your topology involves several layers of switches, that adds up and give Infinitband and similar a clear advantage over RMDA capable 100 Gbit Ethernet.

 

You also cite the reason as to why Infiniband market share has been declining:  Intel purchased one the bigger Infiniband manufactures and created Omnipath.  This result should not be surprising.

 

12 hours ago, leadeater said:

One of the big reasons HPE purchased SGI was their technology that synchronized CPU core clocks across the cluster, they then partnered with Intel to make custom SKUs with this technology in it.

I believe that you are referring to NUMALink which I mentioned.  It replaces the similar coherent interconnect HP re-used for their Dragonhawk based SuperDome systems.  The Intel partnership was mainly to get a chipset license in effect, something Intel hasn't done in years except for these high end server exceptions.  The interconnect chip is an entire HPE affair.

 

12 hours ago, leadeater said:

Not really, Gen-Z was an industry wide spec come about from multiple companies looking at memory semantic/memory centric philosophies. Gen-z was going to happen without The Machine but The Machine did utilize it and took it out of mere specification development and worked on actual implementation.

Chicken, meet egg.  "The Machine" was publicly announced in 2014 with internal research prior to that.  Gen-Z was announced in 2016.  The core concept of "The Machine" was that it was a memory focused architecture, just like Gen-Z.  In fact HPE even says so: "The fabric is what ties physical packages of memory together to form the vast pool of memory at the heart of Memory-Driven Computing. [...] But we're not the only people in the industry who see the necessity for a fast fabric. We're contributing these findings to an industry-led consortium called Gen-Z..."

 

23 hours ago, leadeater said:

Also Gen-z is a protocol specification not a transport phy specification, that part is actually based on 802.3 (Ethernet).

Gen-Z has similarities with OpenCAPI and CCIX in that it enables high bandwidth, low latency, low over head communication of accelerators with memory.   The end goal is the same but how they go about achieving that is indeed very different.

 

23 hours ago, leadeater said:

HPE might have a lot of technology developed but the big thing they lack and why they aren't a competitor to anyone like Cray is because they have no implementation track record for such systems and nothing in the MPP area at all. Acquiring Cray is pretty logical move if you want to enter that market and get access to the require experience and expertise to make any new technology industry ready.

I would disagree with that.  Factor out the recent purchase of SGI, HPE still has a number of system in the Top500 list.  Interestingly enough, one system isn't even x86 based.  Sans SGI, HPE would be the 7th largest vendor on the Top500 list today.  Summing up Cray, SGI and HPE would put them at the number 2 spot.   

 

If you look through previous lists, HPE has been fairly well represented historically, especially in the era where dual socket x86 CPU nodes for clustering were dominate.

Link to comment
Share on other sites

Link to post
Share on other sites

38 minutes ago, power666 said:

You also cite the reason as to why Infiniband market share has been declining:  Intel purchased one the bigger Infiniband manufactures and created Omnipath.  This result should not be surprising.

If IB were still necessary and Ethernet unable to deliver what is required people would have still stuck with it. We have both IB and Ethernet deployments with no intention of using IB, that decision was made before IB was basically gone due to all the buy outs.

 

As the paper says both are viable options, at the time. Realistically it's going to now depend on who you partner with i.e. Nvidia or Intel or go open standards.

 

Ethernet is much harder to get right in this use case because it's never been the focus of it and is now being augmented to fit the purpose where as IB has always been about optimizing latency, bandwidth and data flows, IB switched nature has better control than Ethernet shared medium nature.

 

38 minutes ago, power666 said:

I believe that you are referring to NUMALink which I mentioned.

Not sure, probably part of it but that specific technology was taken out of it to get used in other areas. I've not use any of the HPE Integrity stuff but this aspect was mentioned to us when HPE was briefing us on their up coming Gen 10 product lineup and Skylake-SP.

 

38 minutes ago, power666 said:

Chicken, meet egg.  "The Machine" was publicly announced in 2014 with internal research prior to that.  Gen-Z was announced in 2016.  The core concept of "The Machine" was that it was a memory focused architecture, just like Gen-Z.  In fact HPE even says so: "The fabric is what ties physical packages of memory together to form the vast pool of memory at the heart of Memory-Driven Computing. [...] But we're not the only people in the industry who see the necessity for a fast fabric. We're contributing these findings to an industry-led consortium called Gen-Z..."

The Machine was very much only in simulations for a very long time. Announcement dates don't mean a lot when other companies, more than HPE, were all looking in to the same things. Coming together to work on a single unified standard is logical otherwise you're making 5 different wheels, 1 wheel design is good enough.

 

I've had to sit through HPE's The Machine spiel more than once, most of it is #Marketing. A lot of their really important work was on the Photonics side of it.

 

Quote

The result of our fabric research is working in the prototype we announced in November. But we're not the only people in the industry who see the necessity for a fast fabric. We're contributing these findings to an industry-led consortium called Gen-Z, which is tasked with developing an industry standard for this kind of technology. Now on to photonics.

HPE is a contributor to Gen-Z, it wasn't created for them, by them or only from their research. I've never heard or seen HPE take credit for Gen-Z like you're implying.

 

38 minutes ago, power666 said:

I would disagree with that.  Factor out the recent purchase of SGI, HPE still has a number of system in the Top500 list.  Interestingly enough, one system isn't even x86 based.  Sans SGI, HPE would be the 7th largest vendor on the Top500 list today.  Summing up Cray, SGI and HPE would put them at the number 2 spot.   

 

If you look through previous lists, HPE has been fairly well represented historically, especially in the era where dual socket x86 CPU nodes for clustering were dominate.

I'm well aware of HPE's portfolio in HPC, we have SGI systems (super old pre HPE) and HPE Apollo as well as standard Proliant. The problem with HPE, their actual stuff, is they have no real wrap around services and you're just rolling hardware that Dell or Lenovo can equally offer. HPE has always been one of the biggest industry hardware providers but they don't have the specialty services like Cray or Fujistu offered historically. I know recently HPE has changed track and is now heavily focusing back on the technology again, the current head of HPE has come from a technical/engineering background and from what I was told there were a lot of very happy HPE engineers when that appointment was made.

 

HPE has been shedding dead weight ever since, getting rid of things that don't fit the core of what HPE is or is not performing. Previously they were too much jack of all trades master of nothing, I think that can be attributed to why Dell EMC was able to surpass them (which I know they are super salty about lol).

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, leadeater said:

If IB were still necessary and Ethernet unable to deliver what is required people would have still stuck with it. We have both IB and Ethernet deployments with no intention of using IB, that decision was made before IB was basically gone due to all the buy outs.

 

As the paper says both are viable options, at the time. Realistically it's going to now depend on who you partner with i.e. Nvidia or Intel or go open standards.

The key thing to look for on Ethernet deployments is the node count and topology.  That extra latency in the middle quickly adds up.  Case in point, only four of the top 100 systems leverage Ethernet.  It does get more popular down the list.

 

This is one of the reasons why I mentioned HPE's purchase as a defensive move:  they wanted an interconnect and Cray was one of the 'smaller' companies that wasn't a big competitor or one of HPE's suppliers.  

 

14 minutes ago, leadeater said:

Not sure, probably part of it but that specific technology was taken out of it to get used in other areas. I've not use any of the HPE Integrity stuff but this aspect was mentioned to us when HPE was brief us on their up coming Gen 10 product lineup and Skylake-SP.

In terms of large socket machines, HPE's DragonHawk was not that impressive as it leveraged a design that was initially developed for Itanium (note that this was the Tukwila chip with QPI like Xeons).  SGI had NUMALink which performed better.  So what did HPE do?  Well they didn't buy SGI, well not yet.  Rather, HPE rebranded SGI hardware under the Integrity MC990 name.  (At the time, Dell was doing this too.).  Now Dell has no means of going beyond 8 sockets in a server and thus many of benchmark crowns are HPE's for the taking.  Even if scaling is poor, HPE can brute force performance higher.

 

14 minutes ago, leadeater said:

The Machine was very much only in simulations for a very long time. Announcement dates don't mean a lot when other companies, more than HPE, were all looking in to the same things. Coming together to work on a single unified standard is logical otherwise you're making 5 different wheels, 1 wheel design is good enough.

 

I've had to sit through HPE's The Machine spiel more than once, most of it is #Marketing. A lot of their really important work was on the Photonics side of it.

While most of it right now is #marketing, there is some genuine good research coming out of "The Machine".  I do think HPE goals were a bit too lofty with the time table they wanted.  Memristors only exist in their labs and photonics is on the way but likely won't take off until the entire industry moves to chiplets as the changes to make silicon photonics work won't compromise other aspects of the design.  

 

 

14 minutes ago, leadeater said:

HPE is a contributor to Gen-Z, it wasn't created for them, by them or only from their research. I've never heard or seen HPE take credit for Gen-Z like you're implying.

HPE has been the biggest backer and the biggest contributor to Gen-Z.  There are indeed other companies helping in the effort but HPE has a strong influence in the consortium.

 

14 minutes ago, leadeater said:

I'm well aware of HPE's portfolio in HPC, we have SGI systems (super old pre HPE) and HPE Apollo as well as standard Proliant. The problem with HPE, their actual stuff, is they have no real wrap around services and you're just rolling hardware that Dell or Lenovo can equally offer. HPE has always been one of the biggest industry hardware providers but they don't have the specialty services like Cray or Fujistu offered historically. I know recently HPE has changed track and is now heavily focusing back on the technology again, the current head of HPE has come from a technical/engineering background and from what I was told there were a lot of very happy HPE engineers when that appointment was made.

This circles back to the idea of interconnects:  Ethernet is commodity.   Dell and Lenovo can leverage Ethernet just like they can roll out the same x86 commodity servers.

 

The other thing is that there was an Ethernet competitor in Omnipath but could also be seen as 'commodity':  select Xeon and Xeon Phi chips could get on package fabric without sacrificing PCIe lanes in the host system.  

 

Purchasing Cray does get HPE access to that interconnect and thus the edge they need over their competitors.

 

14 minutes ago, leadeater said:

HPE has been shedding dead weight ever since, getting rid of things that don't fit the core of what HPE is or is not performing. Previously they were too much jack of all trades master of nothing, I think that can be attributed to why Dell EMC was able to surpass them (which I know they are super salty about lol).

This could split into its own conversation due to how we now have HP and HPE as two separate entities.  Mismanagement at the top had HP in a downward spiral until Meg Whitman.  There was more than just dead weight shed to keep the company a float during this period though.  Now that they have stabilized, HPE is essentially re-acquiring talent and technologies they used to have internally.

 

The Dell EMC success has much to do with their own internal re-alignment and for a period of time going private.  Dell played their hand well by going private to tackle some internal alignments that could be done profitably, just not to a shareholder's desire of profit.  The refocus was a success here as well.

Link to comment
Share on other sites

Link to post
Share on other sites

32 minutes ago, power666 said:

The key thing to look for on Ethernet deployments is the node count and topology.  That extra latency in the middle quickly adds up.  Case in point, only four of the top 100 systems leverage Ethernet.  It does get more popular down the list.

They're all generationally old IB too, a lot of them are planned years in advance and are based on the best for purpose at the time. 25Gb (4x for 100Gb) signaling wasn't on the market until 2016 where as IB EDR was 2014. 25Gb was also the first where all the requirements were as standard unlike 10Gb where you had to seek out the correct NIC chips.

 

The systems down the list are much smaller with shorter lead in times and also get replaced sooner, replaced as in something else replaces it's position in the list. We're only just now upgrading our TOR to 25Gb/100Gb, 5 year life cycle etc. It's a bit of a pain because we have to replace all the DAC cables for existing equipment with SFP28 to SFP+, not a small number but needs must.

 

32 minutes ago, power666 said:

This could split into its own conversation due to how we now have HP and HPE as two separate entities.  Mismanagement at the top had HP in a downward spiral until Meg Whitman.  There was more than just dead weight shed to keep the company a float during this period though.  Now that they have stabilized, HPE is essentially re-acquiring talent and technologies they used to have internally.

I was actually talking specifically about things under the HPE portfolio. There's a number of things that got axed but I forget what they were. Should be able to remember, was just at an HPE roadshow about 2 weeks ago where they mentioned them. Doesn't really matter, important thing is HPE is moving back to being a technical leader.

 

Edit:

Also the custom interconnects for MPP systems is a much different thing, Ethernet isn't in that area. With Gen-Z we might see a resurgence of MPP type systems but in lower performance requirements or different workloads.

Link to comment
Share on other sites

Link to post
Share on other sites

@S w a t s o n Random information dump you might find interesting.

 

Quote

Gen-Z IO with PCIe Compatibility
I/O devices that support Gen-Z LPD (Logical PCIe devices) functionality can be discovered and configured using
standard PCI system software using LPD (Logical PCIe devices) functionality. Each LPD appears as PCIe Root Complex Integrated Endpoints (RCiE). A PCIe RCiE is a simplified PCIe endpoint that requires very little software to manage (much of an operating system’s PCIe software is never invoked) and is not constrained by PCIe fabric rules. As a result, each processor or accelerator can support up to 8192 LPDs per supported PCI segment. If one LPD is provisioned per I/O device, then up to 8192 I/O devices can be supported which far exceeds the far exceeding the theoretical 256 device maximum permitted if using native PCIe (the actual number of PCIe devices is less as native PCIe solutions consume multiple bus numbers for PCIe switches and to accommodate hot-plug). This enables massive I/O scale-out solutions such NVMe over Gen-Z storage without incurring the cost and complexity from deploying a NVMe over Fabrics gateway and another separate scale-out fabric. Further, all I/O components can be simultaneously shared enabling multiple processors / accelerators to access any NVMe storage device at any time at any scale with little to no coordination (e.g., single-writer, multi-reader paradigms). RISC-V can support LPDs on GenZ by implementing PECAM support in the Requestor ZMMU which translates PCIe configuration access to a Gen-Z component address. This enables an unmodified OS to transparently support Gen-Z LPD components and fully exploit Gen-Z’s numerous architectural benefits. 

http://genzconsortium.org/wp-content/uploads/2019/04/Accelerating-Innovation-Using-RISC-V-and-Gen-Z_V1.pdf

 

(FML text copy and paste wouldn't work)

Quote

image.png.b830d083031826da7e6f143921c52ec4.png

http://genzconsortium.org/wp-content/uploads/2018/05/Gen-Z-Overview-V1.pdf

 

 

image.png.2b312b98d5664d2bd57c337aa693422e.png

Quote

The high-speed differential signal pins in SMT (surface mount) connector versions support signaling rates from 2.5 GT/s NRZ to 56 GT/s NRZ. These pins can support multiple physical layers including PCI Express® and 802.3. Further, these pins are protocol agnostic, and can support PCI Express, Gen-Z, and others.

http://genzconsortium.org/wp-content/uploads/2018/11/Gen-Z_ScalableConnector_WP.pdf

 

Sounds like when they bring in the Gen-Z specific interfaces they will support PCIe and 802.3.

 

Misc: http://genzconsortium.org/wp-content/uploads/2019/03/Gen-Z-DRAM-PM-Theory-of-Operation-WP.pdf

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×