NVIDIA Made a CPU.. I’m Holding It. - Computex 2023

Zodiark1593 · May 30, 2023

10 minutes ago, themrsbusta said:

I really don't understand why some companies like Apple and Nvidia try to make memory on CPUs so difficult..."Oh I need to put faster memory but you couldn't change overtime" Why this faster memory isn't is a layer before the slow ram?

I even don't understand why nobody did this on graphics cards...like put some DDR5 slots on a card and a layer of GDDR6X between to caching the DDR5 before reach the GPU.

Memory is easy, companies are complicated.

In the case of video cards, adding slots will greatly increase the complexity of the board design, and you will need 3-4 channels to get bandwidth appreciably faster than just going to system memory anyway.

Probably would be more doable in conjunction with HBM though.

igormp · May 30, 2023

21 minutes ago, themrsbusta said:

I really don't understand why some companies like Apple and Nvidia try to make memory on CPUs so difficult..."Oh I need to put faster memory but you couldn't change overtime" Why this faster memory isn't is a layer before the slow ram?

I even don't understand why nobody did this on graphics cards...like put some DDR5 slots on a card and a layer of GDDR6X between to caching the DDR5 before reach the GPU.

Memory is easy, companies are complicated.

It's not that easy and requires a shit ton of die area just for those memory controllers, and your programming models becomes way more complicated this way.

Anyhow, Nvidia is already doing so with the Grace Hopper, HBM on the H100 and a fast interconnect to access the extra RAM on the Grace chip.

8 minutes ago, Zodiark1593 said:

Probably would be more doable in conjunction with HBM though.

Intel has done so with their MAX offerings.

danwat1234 · May 30, 2023

The Bluefield looks like a heat spreader rather than a heatsink.. no fins. bursty loads?

ImorallySourcedElectrons · May 30, 2023

6 hours ago, hishnash said:

Server ARM absolutly do, the domain oh high bandwidth DDR deployments does not belong to x86, both Power and ARM tend to ship the latest generation DDR servers 2 to 3 years before there are x86 options on the market.

Not to the scale of desktop or server x86 CPUs though. Even apple's M1/M2 is relatively speaking a far simpler chip I/O wise. Modularity comes with a cost, especially if you then try to combine it with running at the edge. Anyway, clock your desktop CPU 5% lower and see what happens.

6 hours ago, hishnash said:

I am well aware that post decode it is very simlare, but the decode cost of x86 is MUCH higher than a fix width (you need to decode your instructions to runtime) and the widest decode stage of x86 currently is 6 wide and this is very very complex (and only runs in 6 wide mode in optimal situations most of time it is more like 2 wide). Building a 8 wide ARM decode stage is trivial in comparison. If you want to build a very wide core (that does lots at once) you need to feed it with instructions, an instruction cache etc can help but in the end (unless you have a very very tight loop that fits entirely in cache... not at all general purpose real world at all) modern x86 cpus cores have a limit in that if they make them wider in most cases they will end up instruction starved (waiting for the decode) so they get to a point were they must increase clock speed to match the perfomance. (and increasing clock speed costs POWER).

I fear you have a very outdated grasp of instruction decoding. This is literally the punchline from the 80s, if CPUs still worked this way branching prediction and out of order execution would be highly problematic.

6 hours ago, hishnash said:

PCIe only draws power when under load, unless you skewed up massively! Same with DDR as well.

Most interesting take on the issue, I shall definitely inform my fellow Electronic Elf brethren.

But to be serious, differential drivers have a constant power draw dependent on the applied voltage and the termination resistance. It's the cost of doing business sadly. And in any case, high speed single-ended drivers (e.g., DDR) typically require stage biasing at unfavourable levels to achieve the required performance. Then you have the additional power cost when switching, which is typically frequency dependent. But needless to say, running IO is expensive for your power budget.

And I shall respond to the rest when the bus driver ain't trying to kill us by trying to meet his schedule.

Kronoton · May 30, 2023

8 hours ago, themrsbusta said:

"Oh I need to put faster memory but you couldn't change overtime" Why this faster memory isn't is a layer before the slow ram?

So

a) make it another level of cache, complicated in HW and the onboard RAM wouldn't increase the amount of RAM available

b) 2 tier memory, requires support in SW (the OS but also some applications) with multi level cache already in place everywhere the result maybe something that outside of artificial benchmarks and obscure edgecases (often due to lousy coding) isn't any better than having the lower tier of "RAM" on the systems SSD.

The point is to choose the right amount of RAM !AND! to bring back quality into coding.

maplepants · May 30, 2023

14 hours ago, ImorallySourcedElectrons said:

basically, if you were to build an ARM CPU with the same peripheral requirements as a modern x86 desktop CPU, and then clocked it at frequencies where it hit similar performance, it would use just as much power.

Don't devices like this already exist? Intel and AMD make laptop CPUs and their laptop CPUs aren't just 1:1 copies from desktop. These CPUs don't have more peripheral options than Macbook Pro laptops offer, and they don't offer more performance either. Especially modern Intel laptop CPUs.

By the same token, these Gigabyte servers have similar I/O to their x86 counterparts and yet they have much better performance per watt numbers.

What's difference about the desktop peripheral requirements from those of servers and laptops where you think the ARM performance per watt advantage would disappear in the only that market segment?

Edited May 30, 2023 by maplepants
adding missing words about how the Intel and AMD laptop CPUs are different SKUs from their workstation / desktop chips

Kronoton · May 30, 2023

10 minutes ago, maplepants said:

Don't devices like this already exist? Intel and AMD make laptop CPUs and their laptop CPUs.

Power draw can be due to a few things.

- I/O something with everything on the SoC will win here

- used node, Apple is/was/will always 1 step ahead here

- how good the chip is at powering down unused parts, something ARM manufactures have put more emphasis on as these chips tend to be used in more power sensitive devices

- quality of the architecture, Intel and AMD still dragging that outdated x86 stuff with them is gonna hurt. And even the current AMD64 instructions aren't as suitable for current compute units as the ARM variants (extras cost of translating them to the microcode that actually gets executed).

Issues 1-3 can be solved if Intel/AMD really want it, the 4th one only by killing compatibility (at which point the can just make an ARM or RISC_V CPU).

themrsbusta · May 30, 2023

4 hours ago, Kronoton said:

So

a) make it another level of cache, complicated in HW and the onboard RAM wouldn't increase the amount of RAM available

b) 2 tier memory, requires support in SW (the OS but also some applications) with multi level cache already in place everywhere the result maybe something that outside of artificial benchmarks and obscure edgecases (often due to lousy coding) isn't any better than having the lower tier of "RAM" on the systems SSD.

The point is to choose the right amount of RAM !AND! to bring back quality into coding.

Not that complicated and don't require software support since it's communication would be with a controller which would communicate with the GDDR6X memory directly. Have you ever heard about GTX 970? 3.5gb fast and 512mb slow? or the 6900 XT with slow 256bit vram but with a fast 128mb cache? same principle.

themrsbusta · May 30, 2023

12 hours ago, Zodiark1593 said:

In the case of video cards, adding slots will greatly increase the complexity of the board design, and you will need 3-4 channels to get bandwidth appreciably faster than just going to system memory anyway.

Probably would be more doable in conjunction with HBM though.

Even just a single channel of DDR5 would help since it's much faster than PCIe 4.0 16x and is much closer to the GPU...

Probably these games with low memory problems would have no problem anymore, with the card using the onboard vram as a cache to the ram.

This server CPU should use this same principle.

saltycaramel · May 30, 2023

Linus at 3:10: “can we afford NOT to make this migration to ARM?”

Whoa that escalated quickly!

Enviromentally-friendly Linus wants each one of you to move to yet-non-existent-nor-benchmarked ARM gaming CPUs!

To think that when Apple first launched their power-sipping ARM CPUs his initial reaction was “these are iPads lol, and the charts must be misleading” instead of “this could be a monumental milestone in power saving consumer PCs”.

Kronoton · May 30, 2023

38 minutes ago, themrsbusta said:

Have you ever heard about GTX 970? 3.5gb fast and 512mb slow?

Nope, but wether you do it in HW or SW lots of overhead to figure what needs to be in "fast" and what in "slow" plus lots of chances to get it wrong.

No idea why they did it, but sounds like a solution in search of a problem for me.

38 minutes ago, themrsbusta said:

or the 6900 XT with slow 256bit vram but with a fast 128mb cache?

Does that 128MB count as extra VRAM? Otherwise just another level of cache as I said....

ImorallySourcedElectrons · May 30, 2023

17 hours ago, hishnash said:

Server ARM absolutly do, the domain oh high bandwidth DDR deployments does not belong to x86, both Power and ARM tend to ship the latest generation DDR servers 2 to 3 years before there are x86 options on the market.

This is quite a skewed perspective, in practice the latest generation of memory support is first seen in devices like FPGAs. Like for DDR5, if my memory survived after a day of dealing with dunces in the office, FPGAs saw preliminary hardware support added somewhere around late 2018/early 2019 (based on standard drafts). Meanwhile, the first ARM CPU with DDR5 support that I can find was Q3 2022. Meanwhile, Intel released the consumer 12000 series in Q4 of 2021... Which is to say, I fear you might be somewhat biased here.

17 hours ago, hishnash said:

I am well aware that post decode it is very simlare, but the decode cost of x86 is MUCH higher than a fix width (you need to decode your instructions to runtime) and the widest decode stage of x86 currently is 6 wide and this is very very complex (and only runs in 6 wide mode in optimal situations most of time it is more like 2 wide). Building a 8 wide ARM decode stage is trivial in comparison. If you want to build a very wide core (that does lots at once) you need to feed it with instructions, an instruction cache etc can help but in the end (unless you have a very very tight loop that fits entirely in cache... not at all general purpose real world at all) modern x86 cpus cores have a limit in that if they make them wider in most cases they will end up instruction starved (waiting for the decode) so they get to a point were they must increase clock speed to match the perfomance. (and increasing clock speed costs POWER).

Sorry, but this is just incorrect, if we were in 1985 I'd agree with you, but we've been building consumer market superscalar CPUs since the late 80s/early 90s. To simplify it, you fetch an entire block of code and load it into memory, at the microcode level you interpret and transcode it into instructions that your execution units actually understand, you re-order the instructions and optimize your resource allocation, and then you hand it off to whatever actually executes the instructions. The width doesn't really come into it anymore, and with this sort of system any advantage RISC has in instruction length is offset by the fact that it needs a lot more instructions to achieve the same result, which makes it trickier to perform things like out-of-order execution or branch prediction. The issue you're describing is literally why everyone was proclaiming CISC was dead in the 80s, we solved it. Your CPU ain't starved for instructions, and it is most definitely not processing them one by one and waiting for them. The issue is that most books on the topic (e.g. Tannenbaum's book on computer architecture comes to mind) stop with hardware description around this area, and academia tends to be heavily biased towards ARM because both Intel and AMD keep their cards close to their chest, meaning many books are biased to begin with.

17 hours ago, hishnash said:

The other aspect were RISC like instruction sets (like ARM) benefit the cpu design is in offloading some complexity to the compiler. Since the compiler has a LOT more registers it can directly address in a typical every day application when compiled to ARM you will see a lot fewer stores and load options than you would in the same application compiled to x86. Sure the cpu internal optimisations on both platforms attempt to be smart about store and load and skip them were they are not needed etc but they are trying to be smart about it and that being smart is not perfect and adds complexity (power draw) in the end the same code base compiled for x86 on a modern cpu will still end up with more writes to L1 and reads than when running on a ARM and those reads and writes cost power.

This is again a very 80s take on the issue. That complexity off-loading also comes at a serious cost when dealing with modern CPUs, especially if you're not compiling for a specific target. Like many ARM CPUs miss a hardware implementation of divide, and even those that do have it often only implement it for integers. If the compiler has an exact SKU it can target it might do it quite efficiently, otherwise you just end up with a generic division algorithm that makes sub-optimal use of the CPU's resources. Meanwhile, on an x86 CPU you'll be able to efficiently plan the ALU resources to perform that division. Especially if you consider that some division algorithms have conditionals in them, it's quite an advantage if the CPU handles it internally, since it can schedule things in such a way that a stall never occurs. And that additional intelligence doesn't really come at much of a cost disadvantage anymore, since RISC CPUs ran into the same wall as CISC CPUs a couple of years later, meaning they had to start implementing the same tricks.

And you're wrong on more cache hits for an x86, code with the same goal should have roughly the same amount of cache hits on a modern CISC or RISC CPU, since both use what basically boils down to a hybrid internal architecture in most instances. And the actual number of register banks present in a modern CPU is a topic I ain't touching with a twenty foot barge pole.

17 hours ago, hishnash said:

PCIe only draws power when under load, unless you skewed up massively! Same with DDR as well.

I think you're misunderstanding the differential pair implementation of PCIe. It's not because we implemented measures to avoid DC-shenanigans that you do not have a constant current draw. To simplify the issue somewhat: you need a DC path to enable biasing of the stage, and in a lot of instances we also use current-mode drivers instead of voltage-mode drivers for *reasons*, etc. In practice I've yet to see a practical high speed latest generation PCIe implementation that draws less than a few mW per lane in idle. DDR has other issues, but the end result is the same: constant power draw per IO pin.

17 hours ago, hishnash said:

Yes most routers are ARM or other MIPS etc, those that are x86 are normally much higher level, routers that run a load of stuff like traffic analysis etc.

Something in the style of an old Intel Atom chip is surprisingly cost effective, and you do not see those in "higher level" devices. And there are multiple suppliers of such chips. Just remember, AMD Geode sucked, badly, nearly as much as Linus's swimming pool contractor.

17 hours ago, hishnash said:

yes and for manager services all the current major cloud providers are moving to running these on ARM.

Yes the services depend on dedicated hardware but that high bandwidth IO, GPUs or video encoding paths but the CPUs that data centres are opting to use to manager these dedicated HW tend to be ARM (at least AWS, Google and even MS Azure, I have not used IBMs offerings recently).

I know Amazon is doing some ARM offerings at the moment, but to call Graviton a pure ARM implementation is a bit of a joke. The real truth is somewhere in between the two.

7 hours ago, maplepants said:

Don't devices like this already exist? Intel and AMD make laptop CPUs and their laptop CPUs aren't just 1:1 copies from desktop. These CPUs don't have more peripheral options than Macbook Pro laptops offer, and they don't offer more performance either. Especially modern Intel laptop CPUs.

By the same token, these Gigabyte servers have similar I/O to their x86 counterparts and yet they have much better performance per watt numbers.

What's difference about the desktop peripheral requirements from those of servers and laptops where you think the ARM performance per watt advantage would disappear in the only that market segment?

Uhm, they offer quite a few more options than Apple's M1/M2 chips. It's not because hardware manufacturers don't use them that they're not there.

But when you say they offer the same I/O, I strongly disagree. I don't think you quite understand how wild the hardware interfacing capabilities and backwards compatibility of a modern x86 CPUs are. To give you an idea of what's possible, recent Intel CPUs can still interface with ISA cards from the early 90s as someone recently demonstrated: https://www.youtube.com/watch?v=putHMSzu5og In fact, until quite recently you could still boot up in really archaic memory addressing modes that basically resulted in a system that was code and (to a certain degree) hardware compatible with an 8086. Strip all of those sort of features out of an x86 CPU and it'll go toe-to-toe or potentially even outperform the ARM CPUs in power efficiency, since these x86 monsters tend to be quite a bit more optimized than many ARM-based designs.

18 hours ago, igormp said:

MCUs are not the topic of this discussion, but rather CPUs based on the A-cores.

The division between CPU and MCU is quite ill-defined in the case of many ARM devices.

18 hours ago, igormp said:

Entirely dependent on what you're doing. Many things are moving to be software-defined, hence why we now have somewhat generic accelerators such as bluefield. Heck, even my crappy RK3568 (4x A-55) router can barely handle my network at full load, anyone with more demanding needs won't be satisfied by the current low-end arm offerings, nor will be likely to afford the expensive ones, so an x86 solution ends up being a perfect middle ground, ironically.

Software-defined networking is silly though, but that's an entirely different topic. But for many things it honestly doesn't matter, and the cost at which you can put them on a board is quite similar once you hit a certain performance range.

igormp · May 30, 2023

2 hours ago, ImorallySourcedElectrons said:

The division between CPU and MCU is quite ill-defined in the case of many ARM devices.

20 hours ago, igormp said:

I like to divide those simply between their M and A offering, makes it way easier.

2 hours ago, ImorallySourcedElectrons said:

But for many things it honestly doesn't matter, and the cost at which you can put them on a board is quite similar once you hit a certain performance range.

For your off-the-shelf router that your ISP provides or that a company sells you as a ready to use box? Sure. But for someone building their custom networking solution with OpenWRT or something similar? Then things are a bit different.

iLikeBananas · May 30, 2023

So if nvidia had acquired Arm, how would that have affected this?

crazzp · May 31, 2023

9 hours ago, saltycaramel said:

To think that when Apple first launched their power-sipping ARM CPUs his initial reaction was “these are iPads lol, and the charts must be misleading” instead of “this could be a monumental milestone in power saving consumer PCs”.

Because it is. ipad used the same architecture and now even the same exact chip.

He was complaining about the unlabelled charts and comparing to unknown CPUs/laptops.

maplepants · May 31, 2023

12 hours ago, ImorallySourcedElectrons said:

Uhm, they offer quite a few more options than Apple's M1/M2 chips. It's not because hardware manufacturers don't use them that they're not there.

But when you say they offer the same I/O, I strongly disagree. I don't think you quite understand how wild the hardware interfacing capabilities and backwards compatibility of a modern x86 CPUs are. To give you an idea of what's possible, recent Intel CPUs can still interface with ISA cards from the early 90s as someone recently demonstrated: https://www.youtube.com/watch?v=putHMSzu5og In fact, until quite recently you could still boot up in really archaic memory addressing modes that basically resulted in a system that was code and (to a certain degree) hardware compatible with an 8086. Strip all of those sort of features out of an x86 CPU and it'll go toe-to-toe or potentially even outperform the ARM CPUs in power efficiency, since these x86 monsters tend to be quite a bit more optimized than many ARM-based designs.

If Intel and AMD are wasting a ton of their power budget on making sure they have good support for outdated desktop I/O cards in their laptop CPUs, then I just think they're making a mistake. If the only I/O advantage x86 can offer over ARM is via these old interfaces then I don't think it's an advantage.

Looking specifically at the Nvidia Grace CPU from the video, I don't see how it could be considered I/O constrained. Nor do I see the M2 Max as being meaningfully I/O constrained.

You could be right that if modern ARM chips maintained I/O support for thinks like the BBC micro's cassette deck then they'd be just as power hungry as x86. But I don't see that as a detriment for ARM, I see that as design mistake by Intel and AMD. I think that modern interfaces like PCIe and Thunderbolt are flexible enough that they can support what people need and wasting your power budget on outdated I/O isn't an advantage.

05032-Mendicant-Bias · May 31, 2023

If there is a switch to be made, it would be from X86-64 to RiscV, which is an open ISA. If you have to recompile and redo design anyway, you might as well target an ISA for which you have no royalties to pay to anyone.

ImorallySourcedElectrons · May 31, 2023

14 hours ago, maplepants said:

If Intel and AMD are wasting a ton of their power budget on making sure they have good support for outdated desktop I/O cards in their laptop CPUs, then I just think they're making a mistake. If the only I/O advantage x86 can offer over ARM is via these old interfaces then I don't think it's an advantage.

Looking specifically at the Nvidia Grace CPU from the video, I don't see how it could be considered I/O constrained. Nor do I see the M2 Max as being meaningfully I/O constrained.

You could be right that if modern ARM chips maintained I/O support for thinks like the BBC micro's cassette deck then they'd be just as power hungry as x86. But I don't see that as a detriment for ARM, I see that as design mistake by Intel and AMD. I think that modern interfaces like PCIe and Thunderbolt are flexible enough that they can support what people need and wasting your power budget on outdated I/O isn't an advantage.

Way to take things out of context, but supporting all those additional features and the enormous modularity is what makes a modern desktop CPU so flexible and what ensures that hardware of even five years ago ain't a dead brick. Meanwhile, these NVidia Grace and Apple M1/2 CPUs only need to support very specific use cases with no expected upgrade path, nor are they expected to support a wide range of hardware, nor are they expected to be able to run code that was written thirty years ago. All these things are expected of an x86 CPU, and most of the mainstream ones will in fact do all of that quite well. You can get a pretty good idea of the actual feature set once you start looking through the actual datasheets of these CPUs: https://www.intel.com/content/www/us/en/products/docs/processors/core/core-technical-resources.html

You're really comparing things that don't compete in the same league.

maplepants · May 31, 2023

44 minutes ago, ImorallySourcedElectrons said:

Way to take things out of context, but supporting all those additional features and the enormous modularity is what makes a modern desktop CPU so flexible and what ensures that hardware of even five years ago ain't a dead brick. Meanwhile, these NVidia Grace and Apple M1/2 CPUs only need to support very specific use cases with no expected upgrade path, nor are they expected to support a wide range of hardware, nor are they expected to be able to run code that was written thirty years ago. All these things are expected of an x86 CPU, and most of the mainstream ones will in fact do all of that quite well. You can get a pretty good idea of the actual feature set once you start looking through the actual datasheets of these CPUs: https://www.intel.com/content/www/us/en/products/docs/processors/core/core-technical-resources.html

You're really comparing things that don't compete in the same league.

I'm not trying to take things out of context, I'm trying to understand your point better. Sorry if it seems like I'm doing that, sometimes tone doesn't come through via text the way we hope.

I read through the spec sheets you linked, but that just supports my argument that supporting all of these I/O interfaces isn't a competitive advantage; it's a mistake. Sure the latest Intel CPUs offer the MIPI* CSI-2 Camera Interconnect but USB offers similar performance with trade-offs that many laptop users would be extremely happy to make.

The context here is us talking about ARM CPUs like Nvidia Grace or Ampere Altra being used in a workstation. You wrote that you think such a workstation computer would have to offer the same I/O support as a modern Intel CPU and that once all this support was added on, the performance per watt advantage currently enjoyed by these ARM CPUs against their Xeon or EPYC competitors would vanish.

My point is that I don't think this is true, because I don't think a theoretical Nvidia Grace Hopper 2 workstation would need to support any of the legacy I/O standards that they currently lack support for. I think this case is made for me quite well by the fact that ARM laptops, and ARM servers already exist and they're not held back by their lack of support for ISA, or MIPI CSI-2.

I don't think the market for x86 will vanish, and they probably should have some CPUs that support these old standards in their lineup. But wasting power on them in laptops and servers is costing them market share in those segments. And once ARM workstations from outside Apple come, the extra power for these old standards will hurt Intel and AMD there too.

igormp · May 31, 2023

1 hour ago, maplepants said:

Ampere Altra being used in a workstation.

Already a thing, its power efficiency isn't that different from a Threadripper workstation.

ImorallySourcedElectrons · June 1, 2023

On 6/1/2023 at 12:33 AM, maplepants said:

I'm not trying to take things out of context, I'm trying to understand your point better. Sorry if it seems like I'm doing that, sometimes tone doesn't come through via text the way we hope.

I read through the spec sheets you linked, but that just supports my argument that supporting all of these I/O interfaces isn't a competitive advantage; it's a mistake. Sure the latest Intel CPUs offer the MIPI* CSI-2 Camera Interconnect but USB offers similar performance with trade-offs that many laptop users would be extremely happy to make.

I don't think you quite understand what you're trying to push as narrative. You would literally end up breaking support for hardware and software that's sometimes only a year old at the time of release. You might find that acceptable, but you'd make it impossible to make modular systems or to reuse existing designs. Significant portions of peripheral hardware would have to be redesigned with every single CPU generation, OS vendors would have to implement layer upon layer of compatibility interfaces, unified driver frameworks like we currently have would become incredibly tricky to make, and I could continue for a while, basically it'd incur massive costs for a large part of the ecosystem, and forget about being able to build your own PC.

In fact, many folks seem to have forgotten why x86 stuck around, it won the CPU architecture "wars" in the 80s because it offered a line of mostly backwards compatible hardware and software that led up to the i386 (aka Intel 80386)/i486/Pentium/..., . Meanwhile, if you developed software for many of the competing systems you were kind of screwed every time a new SKU came out. Like folks claim the Z80180 is largely compatible with the Z80 should hand me some of whatever they're smoking, because I sure could use it for the next boring business meeting. Motorola had similar issues with the whole swath of MC68k successors they tried to make, meanwhile Intel got it mostly right.

For that matter, at how many ARM architectures are we that are wildly incompatible with each other?

On 6/1/2023 at 12:33 AM, maplepants said:

The context here is us talking about ARM CPUs like Nvidia Grace or Ampere Altra being used in a workstation. You wrote that you think such a workstation computer would have to offer the same I/O support as a modern Intel CPU and that once all this support was added on, the performance per watt advantage currently enjoyed by these ARM CPUs against their Xeon or EPYC competitors would vanish.

My point is that I don't think this is true, because I don't think a theoretical Nvidia Grace Hopper 2 workstation would need to support any of the legacy I/O standards that they currently lack support for.

That's an entirely different argument. You and several others are arguing that these things are so power efficient that Intel and AMD should just pack their bags and switch to ARM. But the reality is that it's not a substitute at all for the applications Intel and AMD are targeting. It's not because you don't understand why they're putting effort into supporting all these legacy features that they're wrong in doing so. There are some pretty large high-value markets that you most likely never considered.

On 6/1/2023 at 12:33 AM, maplepants said:

I think this case is made for me quite well by the fact that ARM laptops, and ARM servers already exist and they're not held back by their lack of support for ISA, or MIPI CSI-2.

Those older busses are also used to talk to management chips on the motherboard, so you're now going to redesign all of those as well? And you're going to rewrite all the software that goes along with said busses? I really think you don't understand the actual cost and implications of what you're proposing. It's fine that you can slap a tablet design in a slightly larger case and give it a keyboard, but that's not a substitute for a proper laptop's potential feature set.

On 6/1/2023 at 12:33 AM, maplepants said:

I don't think the market for x86 will vanish, and they probably should have some CPUs that support these old standards in their lineup. But wasting power on them in laptops and servers is costing them market share in those segments. And once ARM workstations from outside Apple come, the extra power for these old standards will hurt Intel and AMD there too.

They've been trying to push those laptops with ARM CPUs for the last ten years, lest we forget Microsoft's initial foray into their Surface line-up, and Apple switching to ARM will do exactly nothing, if it did we'd all be using IBM CPUs right about now. And that efficiency only gets you so far, now run Excel with a couple of hundred thousand divisions and see how well that pans out for you, there's more to making a good CPU than having it run power efficient.

And you want to know the biggest irony in all of this? ARM pretty much killed their own chances by releasing so many sub-variants on their instruction set. That alone disqualifies them for many industrial applications where long term support is expected. This is why it's difficult to get a ten year old ARM CPU, but why I can still get a pin and voltage level compatible i386.

maplepants · June 2, 2023

7 hours ago, ImorallySourcedElectrons said:

I don't think you quite understand what you're trying to push as narrative. You would literally end up breaking support for hardware and software that's sometimes only a year old at the time of release. You might find that acceptable, but you'd make it impossible to make modular systems or to reuse existing designs. Significant portions of peripheral hardware would have to be redesigned with every single CPU generation, OS vendors would have to implement layer upon layer of compatibility interfaces, unified driver frameworks like we currently have would become incredibly tricky to make, and I could continue for a while, basically it'd incur massive costs for a large part of the ecosystem, and forget about being able to build your own PC.

You've made this point about 1 to 5 year old standards being ditched with an ARM transition before, but you haven't given any examples. What are some modern I/O standards that would be dropped by a move to an ARM workstation? I tried to look for some examples myself, but I can't find any.

7 hours ago, ImorallySourcedElectrons said:

That's an entirely different argument. You and several others are arguing that these things are so power efficient that Intel and AMD should just pack their bags and switch to ARM. But the reality is that it's not a substitute at all for the applications Intel and AMD are targeting. It's not because you don't understand why they're putting effort into supporting all these legacy features that they're wrong in doing so. There are some pretty large high-value markets that you most likely never considered.

That you say this is a different argument makes me think we've been talking past each other a little bit. I'm not trying to say that AMD and Intel should ditch x86 in favour of ARM. In your first post you say:

Quote

if you were to build an ARM CPU with the same peripheral requirements as a modern x86 desktop CPU, and then clocked it at frequencies where it hit similar performance, it would use just as much power

And I don't think this is true. I don't think an ARM workstation would need to have the same peripheral requirements in order to be competitive. The reason I think that's so, is that ARM laptops and ARM servers are getting along just fine without supporting the same amount of legacy hardware as x86 laptops and servers.

They don't need to completely replace x86 in order for them to be good products that serve their users well and make lots of money.

7 hours ago, ImorallySourcedElectrons said:

Those older busses are also used to talk to management chips on the motherboard, so you're now going to redesign all of those as well? And you're going to rewrite all the software that goes along with said busses? I really think you don't understand the actual cost and implications of what you're proposing. It's fine that you can slap a tablet design in a slightly larger case and give it a keyboard, but that's not a substitute for a proper laptop's potential feature set.

What is a "proper laptop"? Maybe in your mind a "proper laptop" has to support I/O cards from the 90s that you can't even physically connect to it without taking the thing out of it's shell, but I really don't think that matters to Dell XPS 13 buyers.

There are definitely market segments that need this stuff, and those segments might be the least price sensitive segments, but they're not the whole market.

7 hours ago, ImorallySourcedElectrons said:

They've been trying to push those laptops with ARM CPUs for the last ten years, lest we forget Microsoft's initial foray into their Surface line-up, and Apple switching to ARM will do exactly nothing, if it did we'd all be using IBM CPUs right about now. And that efficiency only gets you so far, now run Excel with a couple of hundred thousand divisions and see how well that pans out for you, there's more to making a good CPU than having it run power efficient.

ARM chips can be plenty powerful, while also being efficient. But my point doesn't require them to be the most powerful chips ever made.

Work stations always come with trade-offs. When I was a consultant, I worked with some firms that standardized on Think Stations in the P360 Ultra and Tiny size class (I forget what they were actually called then). They could do this because what's important is that the work station is powerful enough for your intended workload, not that it's the theoretically most powerful think your building can supply juice to.

The difference, I think, that Apple makes is that they've now shown that ARM laptops and small form factor desktops can actually be good. The original Microsoft Surface was terrible, and so it made sense that nobody looked seriously into doing more with ARM then. The M1 and M2 CPU series are really good, as is Nvidia's Grace CPU.

It makes sense that after the Tegra 3 Nvidia made no effort to get into the workstation market, but I think it's starting to make sense for them now. And I think that if they do enter the market there's a lot that would tempt anybody running Ubuntu on their Think Station.

ImorallySourcedElectrons · June 2, 2023

12 hours ago, maplepants said:

The difference, I think, that Apple makes is that they've now shown that ARM laptops and small form factor desktops can actually be good. The original Microsoft Surface was terrible, and so it made sense that nobody looked seriously into doing more with ARM then. The M1 and M2 CPU series are really good, as is Nvidia's Grace CPU.

The only thing Apple has demonstrated that you can make a very locked down platform with such a CPU, Apple does not have to contend with people running other operating systems or trying to hook up the latest meanest graphics card to their SoC, they designed a SoC to go along with very specific hardware that they also designed. Same for NVidia's ARM applications, that's a locked down system with zero modularity and no real third party manufacturers involved. They have absolutely no need to compromise to support future or former extension options, folks will specifically design applications for that very system and nothing else. You're basically comparing an ASIC to a general purpose chip and are now complaining that the general purpose chip can't beat the ASIC's performance for specific tasks under conditions that favour the ASIC, so obviously everything must be solved with the ASIC. That is the part you're not grasping here.

12 hours ago, maplepants said:

You've made this point about 1 to 5 year old standards being ditched with an ARM transition before, but you haven't given any examples. What are some modern I/O standards that would be dropped by a move to an ARM workstation? I tried to look for some examples myself, but I can't find any.

Well, you already wanted to get rid of the LPC interface that supports adding ISA. So, no more PS/2 keyboards and mice, bye-bye TPMs, good luck if you want a serial link to configure something like an on-board data modem. Get rid of SMB? Bye-bye temperature sensors, fan controllers, etc. Maybe we should just kick out the programmable interrupt controller, I mean it's basically an 8259 from the 70s, and while we're at it we can also throw out the legacy DMA controllers that are still in there that are most definitely never used. And no, you cannot necessarily replace all these things with USB or PCIe. For example, USB does not support the same level of interrupts those legacy interfaces support. And yes, most of the ARM SoCs you're referring to miss all these features, because they never even try to support this much hardware. For example, does M1 support folks randomly tacking on fans and temperature sensors onto some management bus? Because that's the sort of stuff we've been doing to x86 CPUs. And you are now most definitely going to say Intel and AMD should then just remove these things for laptop CPUs in lieu of power efficiency, but that's removing core functionality that's often used for manging system internals that you're often not even aware of, and we're at major hardware redesign once more.

12 hours ago, maplepants said:

And I don't think this is true. I don't think an ARM workstation would need to have the same peripheral requirements in order to be competitive. The reason I think that's so, is that ARM laptops and ARM servers are getting along just fine without supporting the same amount of legacy hardware as x86 laptops and servers.

It really does, because you're introducing problems you're most likely not even aware of. For example, the aforementioned serial link to setup an onboard modem is a simple example. And your next thing will be "just use a usb to serial converter there", but that means dedicating some of the limited capacity of your USB host controller to that device, which will most likely lead to delays with interrupts from other devices you interfaced in a similar manner, not to mention that you're now supporting yet another device (said converter).

12 hours ago, maplepants said:

There are definitely market segments that need this stuff, and those segments might be the least price sensitive segments, but they're not the whole market.

ARM chips can be plenty powerful, while also being efficient. But my point doesn't require them to be the most powerful chips ever made.

Work stations always come with trade-offs. When I was a consultant, I worked with some firms that standardized on Think Stations in the P360 Ultra and Tiny size class (I forget what they were actually called then). They could do this because what's important is that the work station is powerful enough for your intended workload, not that it's the theoretically most powerful think your building can supply juice to.

That does not consider the economics of the manufacturing and the volume market when you're manufacturing chips this complex.

And this is by no means about computational resources and power usage alone, it's also about software compatibility. You think switching doesn't cause major issues, even with modern virtualisation and emulation technology? M1 came out in 2020, it's safe to say Adobe most likely got half a year to a year of heads-up given how popular they are on the Mac platform. Three to four years later they still haven't gotten SVG export to work in Photoshop: https://helpx.adobe.com/photoshop/kb/photoshop-for-apple-silicon.html Those sort of issues are why good legacy support is so important, it means you don't have your users and software developers chasing down odd bugs for several years.

igormp · June 2, 2023

2 hours ago, ImorallySourcedElectrons said:

Well, you already wanted to get rid of the LPC interface that supports adding ISA. So, no more PS/2 keyboards and mice, bye-bye TPMs, good luck if you want a serial link to configure something like an on-board data modem. Get rid of SMB? Bye-bye temperature sensors, fan controllers, etc. Maybe we should just kick out the programmable interrupt controller, I mean it's basically an 8259 from the 70s, and while we're at it we can also throw out the legacy DMA controllers that are still in there that are most definitely never used. And no, you cannot necessarily replace all these things with USB or PCIe. For example, USB does not support the same level of interrupts those legacy interfaces support. And yes, most of the ARM SoCs you're referring to miss all these features, because they never even try to support this much hardware. For example, does M1 support folks randomly tacking on fans and temperature sensors onto some management bus? Because that's the sort of stuff we've been doing to x86 CPUs. And you are now most definitely going to say Intel and AMD should then just remove these things for laptop CPUs in lieu of power efficiency, but that's removing core functionality that's often used for manging system internals that you're often not even aware of, and we're at major hardware redesign once more.

There are x86 systems without many of that legacy stuff, some of those are called consoles

IIRC, the PS4 got rid of the legacy x86 timer, interrupts and many other stuff (which does make tons of sense, ofc).

ImorallySourcedElectrons · June 3, 2023

16 hours ago, igormp said:

There are x86 systems without many of that legacy stuff, some of those are called consoles

IIRC, the PS4 got rid of the legacy x86 timer, interrupts and many other stuff (which does make tons of sense, ofc).

Which is application specific and high volume, so worth customizing for. But what @maplepants is talking about would literally compete with the tablets and chromebooks, a saturated low margin market.

Sign In

NVIDIA Made a CPU.. I’m Holding It. - Computex 2023

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites