Linus Torvalds: "I Hope AVX512 Dies A Painful Death"

porina · July 13, 2020

28 minutes ago, SpaceGhostC2C said:

And now I kind of feel we are re-having a conversation we had with you and @leadeater in another thread, don't know which one - I'm becoming a broken record

Linus' comments on AVX-512 were posted in another thread, where I went over some of this already... so yeah... maybe I should have more away from forum time

Salv8 (sam) · July 13, 2020

8 hours ago, DuckDodgers said:

Old Linus yells at yet another x86 ISA extension

StDragon · July 13, 2020

1 hour ago, porina said:

As covered before, GPU is not appropriate for all code, even if it appears parallel-able. Data dependencies remain a sticking point, and while GPUs excel in more trivally parallel code, it doesn't work for everything. Something kinda in between a CPU and a GPU might be interesting. I think it was in a different thread, I suggested on possible avenue companies might take in future are simpler CPU cores, but have a lot more of them. Core complexity will still be more CPU-like, but scale more GPU-like.

I never did understand why simpler FPGA PCIe cards haven't caught on. I know they're expensive now, but isn't that due to economies of scale? Seems to me that if you need to run some specialized code but not quite justify an ASIC (because those are really expensive), why not flash an FPGA and dedicate the app to it for that one instruction? Or better yet, have an FPGA core along side all the others in one CPU package.

porina · July 13, 2020

38 minutes ago, StDragon said:

I never did understand why simpler FPGA PCIe cards haven't caught on. I know they're expensive now, but isn't that due to economies of scale? Seems to me that if you need to run some specialized code but not quite justify an ASIC (because those are really expensive), why not flash an FPGA and dedicate the app to it for that one instruction? Or better yet, have an FPGA core along side all the others in one CPU package.

Probably cost. FPGAs vary in size. The more serious ones will cost more serious money. If you really need it, you pay for it. Smaller ones may not provide enough return to be worth including. I've not used one in my career in the electronics industry, but programming one probably isn't like what x86 coders are used to. To get it working... may be more pain than it is worth unless you have a big enough task for it.

Some here already think AVX-512 is too exotic already. ASICs are way beyond that level.

2 minutes ago, valdyrgramr said:

Doesn't this guy just say controversial things to stay relevant? Not arguing if he's right or wrong, but this guy comes off no different than John McAfee.

I think he's just used to saying his mind. Didn't he go on some training or whatever to make himself a better person to worth with? Wonder what the old him would have said in the same situation?

StDragon · July 13, 2020

5 minutes ago, valdyrgramr said:

Well, I mean to be fair it was probably not fair to compare him to McAfee. McAfee is a crazy ranter who thinks he knows more than he does to boost his own ego.

McAfee is an anarchist fugitive that's into bath salts. Totally different level compared to Torvalds. It's almost an insult to Torvalds to even be mentioned in the same thread as McAfee.

cj09beira · July 14, 2020

2 hours ago, StDragon said:

I never did understand why simpler FPGA PCIe cards haven't caught on. I know they're expensive now, but isn't that due to economies of scale? Seems to me that if you need to run some specialized code but not quite justify an ASIC (because those are really expensive), why not flash an FPGA and dedicate the app to it for that one instruction? Or better yet, have an FPGA core along side all the others in one CPU package.

because it wont work well, outside of long continous workloads, fpga would be limited to one or 2 programs, as there is a limited amount of resources on it, everytime you use it you probably have to program it again, so in real use cases there will be a significantly large latency to using it, then you have the same problem gpus have of being away from the cpu with limited local memory, and we are back to needing a much faster lower latency connection, cxl gen-z etc can't come soon enough.

Kisai · July 14, 2020

3 hours ago, porina said:

I think he's just used to saying his mind. Didn't he go on some training or whatever to make himself a better person to worth with? Wonder what the old him would have said in the same situation?

Torvalds used to be very rude and condescending to the people contributing patches to the Linux Kernel, and very public about it. That chases away people from contributing anything.

It's a corner stone of polite conversations. Don't throw a stone at someone and not expect to break a window. There's posters on LTT that don't understand this either. We don't ask people to be performatively nice, against the poster's personality, we ask that people don't attack other people for wanting to contribute.

Wikipedia is notoriously awful about this kind of thing, all it takes is one bad interaction and people who were once happy to contribute, would rather go contribute their time to something else. That's not getting upset that ones contributions weren't accepted, but that the interaction was abusive, and why would one ever want to contribute.

Anyway, AVX512, probably not the best use of CPU die space, it's probably time to legitimately create a new chip, move some of these edge-case/rarely used instructions into a chiplet design that can be binned off and replaced with additional cores for performance chips.

leadeater · July 14, 2020

2 hours ago, Kisai said:

Anyway, AVX512, probably not the best use of CPU die space, it's probably time to legitimately create a new chip, move some of these edge-case/rarely used instructions into a chiplet design that can be binned off and replaced with additional cores for performance chips.

AVX512 really does only belong in CPUs like the Cascade Lake-AP product line targeted directly at HPC and computational data analytics, there isn't really a wide use case outside of that and like you say it's possible to improve already used things than something that is not.

Most CPUs and systems don't have the supporting cache and memory bandwidth to properly utilize AVX512 and AVX512 so far has not been created or deployed equally.

Problem is I don't actually know how much die area AVX512 uses or more correctly how much you gain back by taking it out.

As you can see broadly the potential space saving isn't actually that much. But hey a giant AVX512 chiplet(s) using Intel's die stacking would be amazing to see if something like that would work. Second layer could all be AVX512 chiplets with large cache to feed it and I'd think you should have GPU like performance if not much greater, all for the low low price of.... I guess $30k lol.

Zodiark1593 · July 14, 2020

1 hour ago, leadeater said:

AVX512 really does only belong in CPUs like the Cascade Lake-AP product line targeted directly at HPC and computational data analytics, there isn't really a wide use case outside of that and like you say it's possible to improve already used things than something that is not.

Most CPUs and systems don't have the supporting cache and memory bandwidth to properly utilize AVX512 and AVX512 so far has not been created to deployed equally.

Problem is I don't actually know how much die area AVX512 uses or more correctly how much you gain back by taking it out.

As you can see broadly the potential space saving isn't actually that much. But hey a giant AVX512 chiplet(s) using Intel's die stacking would be amazing to see if something like that would work. Second layer could all be AVX512 chiplets with large cache to feed it and I'd think you should have GPU like performance if not much greater, all for the low low price of.... I guess $30k lol.

Taking AVX-512 out of most consumer chips would probably hurt development somewhat as this would limit the systems you can properly test with before deployment.

Kisai · July 14, 2020

2 hours ago, leadeater said:

AVX512 really does only belong in CPUs like the Cascade Lake-AP product line targeted directly at HPC and computational data analytics, there isn't really a wide use case outside of that and like you say it's possible to improve already used things than something that is not.

Most CPUs and systems don't have the supporting cache and memory bandwidth to properly utilize AVX512 and AVX512 so far has not been created or deployed equally.

Problem is I don't actually know how much die area AVX512 uses or more correctly how much you gain back by taking it out.

As you can see broadly the potential space saving isn't actually that much. But hey a giant AVX512 chiplet(s) using Intel's die stacking would be amazing to see if something like that would work. Second layer could all be AVX512 chiplets with large cache to feed it and I'd think you should have GPU like performance if not much greater, all for the low low price of.... I guess $30k lol.

IMO, and I'm not a chip engineer so don't take this as anything but an educated guess

https://www.arm.com/why-arm/technologies/big-little or https://www.arm.com/why-arm/technologies/dynamiq (which is the successor to it)

I'd probably aim for the "BIG.little" type of deal where the BIG cores have all the features, and additional little cores with some of the seldom-used/inefficient power-hungry features removed.

So for a many-core server you'd probably have many mostly BIG cores at a fixed TDP, and a few little cores that are used for background processes that the heavy load probably doesn't use.

For a desktop or laptop H part you'd probably have several higher speed BIG cores and a few little cores that all maxed out use the maximum TDP.

Then for a U/Y laptop or tablet device you'd have mostly LITTLE cores and only a few of the BIG cores.

IN a stacked/chiplet design you'd then have the option to re-arrange these to either be BIG.BIG, BIG.LITTLE, or LITTLE. So an i9 might be all BIG cores and only two little cores (eg 10+2,) and a i3 might only have two BIG cores and all LITTLE cores (2+4) and the i5 would be (4+8), i7 would be (8+4)

Anyway that is just my thought on it. Maybe Intel can't do this with the Core architecture.

DuckDodgers · July 14, 2020

4 hours ago, leadeater said:

Problem is I don't actually know how much die area AVX512 uses or more correctly how much you gain back by taking it out.

Here is now much AVX-512 takes in direct area cost:

There is additional hidden cost in extended data-paths overhead of the L1 data cache, to keep up with feeding the dual 512-bit pipelines.

leadeater · July 14, 2020

21 minutes ago, DuckDodgers said:

There is additional hidden cost in extended data-paths overhead of the L1 data cache, to keep up with feeding the dual 512-bit pipelines

How much is a bit more than that though since it's not just taking them out, doing so means an entire re-layout of the die so you could gain more or less than expected because of design rules. Client arch only has single AVX512 unit (unless that's changed) and HEDT/Server has two (with one disabled in lower SKU Xeon) which is that FMA sitting in the LLC, in fact those entire areas around the the main Core are only on HEDT/Server and not client, client arch uses Ring Bus and the L3 Cache is in that area (another Intel problem right now, Ring vs Mesh rather than just one).

Fnige · July 14, 2020

1 hour ago, DuckDodgers said:

Quoth the image

How do you even know whats what in that photo

cj09beira · July 14, 2020

9 hours ago, Kisai said:

Torvalds used to be very rude and condescending to the people contributing patches to the Linux Kernel, and very public about it. That chases away people from contributing anything.

It's a corner stone of polite conversations. Don't throw a stone at someone and not expect to break a window. There's posters on LTT that don't understand this either. We don't ask people to be performatively nice, against the poster's personality, we ask that people don't attack other people for wanting to contribute.

Wikipedia is notoriously awful about this kind of thing, all it takes is one bad interaction and people who were once happy to contribute, would rather go contribute their time to something else. That's not getting upset that ones contributions weren't accepted, but that the interaction was abusive, and why would one ever want to contribute.

Anyway, AVX512, probably not the best use of CPU die space, it's probably time to legitimately create a new chip, move some of these edge-case/rarely used instructions into a chiplet design that can be binned off and replaced with additional cores for performance chips.

linus is a big two edged sword, he is great as he has the balls to call out big companies when its appropriate (or not), but he is a huge dick, linux would be better of if they had a linus translator for the communications between linus and contributors.

like: you suck balls=>you could be better

3 hours ago, leadeater said:

How much is a bit more than that though since it's not just taking them out, doing so means an entire re-layout of the die so you could gain more or less than expected because of design rules. Client arch only has single AVX512 unit (unless that's changed) and HEDT/Server has two (with one disabled in lower SKU Xeon) which is that FMA sitting in the LLC, in fact those entire areas around the the main Core are only on HEDT/Server and not client, client arch uses Ring Bus and the L3 Cache is in that area (another Intel problem right now, Ring vs Mesh rather than just one).

ring bus should be on its way out anyways, even the consumer chips are already at 10 cores which really is where ring bus completely stops, the big little rumors point to 2 ring buses, but that wont help too much for too long.

btw do you think it will be successful, the big little approach i mean, seems like it wont to me, specially with amd pumping 12+ core chips

leadeater · July 14, 2020

1 minute ago, cj09beira said:

btw do you think it will be successful, the big little approach i mean, seems like it wont to me, specially with amd pumping 12+ core chips

No idea, it'll live or die on the software support side. I think Intel are targeting it at the lower market segments and low power so I don't think higher end Ryzen will have much impact at all. I do think both the Ryzen 4000 H & U perform significantly better than Intel was planning on though. If AMD can take the laptop market that alone will have the most impact on both companies, it's by far the largest volume PC market.

Next 12 months is likely going to shape the next decade, fun times, fun times.

The Benjamins · July 14, 2020

4 hours ago, leadeater said:

No idea, it'll live or die on the software support side. I think Intel are targeting it at the lower market segments and low power so I don't think higher end Ryzen will have much impact at all. I do think both the Ryzen 4000 H & U perform significantly better than Intel was planning on though. If AMD can take the laptop market that alone will have the most impact on both companies, it's by far the largest volume PC market.

Next 12 months is likely going to shape the next decade, fun times, fun times.

The laptop space is finally moving again. I love that I can now get a 6c 16GB of ram ultra book for $600

cj09beira · July 14, 2020

https://vsstaticssl.lvl3.on24.com/event/25/07/35/5/rt/1/resources/ThinkStation%2BP620%2BDatasheet%2B-%2BFinal-AD6B.pdf

threadripper pro is here:

128 pcie lanes true

64 cores true

1TB of memory capacity with 128GB dimms true

and lenovo is launching a workstation with it.

@leadeater

SpaceGhostC2C · July 14, 2020

1 hour ago, cj09beira said:

https://vsstaticssl.lvl3.on24.com/event/25/07/35/5/rt/1/resources/ThinkStation%2BP620%2BDatasheet%2B-%2BFinal-AD6B.pdf

threadripper pro is here:

128 pcie lanes true

64 cores true

1TB of memory capacity with 128GB dimms true

and lenovo is launching a workstation with it.

Be right back, I have a bank to rob...

leadeater · July 15, 2020

7 hours ago, cj09beira said:

https://vsstaticssl.lvl3.on24.com/event/25/07/35/5/rt/1/resources/ThinkStation%2BP620%2BDatasheet%2B-%2BFinal-AD6B.pdf

threadripper pro is here:

128 pcie lanes true

64 cores true

1TB of memory capacity with 128GB dimms true

and lenovo is launching a workstation with it.

@leadeater

Quote

AMD Announces Ryzen Threadripper Pro: Workstation Parts for OEMs Only

Lammmeeeeeeeee

mr moose · July 15, 2020

On 7/14/2020 at 8:38 AM, valdyrgramr said:

Well, I mean to be fair it was probably not fair to compare him to McAfee.

I think it's fair. To me he is a cross between steve jobs and Mcafee. He is good at doing that one thing he is good at, but the rest of the time he is just angry, treating workers like shit and saying stuff not just out of context but outside of his relevance.

Vishera · July 15, 2020

8 hours ago, cj09beira said:

and lenovo is launching a workstation with it.

Their workstation lineup is even worse than their gaming lineup,

for $1,309.00 you are getting:

Quote

ThinkStation P330 Tiny

Processor: 8th Gen Intel® Core™ i3-8100T (3.10GHz, 4 cores, 6MB Cache)

Operating System: Windows 10 Pro 64

Memory: 8GB DDR4 2666MHz

Graphics: NVIDIA Quadro P620 2GB 4 x Mini DP

Optical Drive: None

Warranty: 3 Year On-site

Form Factor: Tiny Q370

M.2 Storage Card: 256GB Solid State Drive, PCIe-NVME, M.2, Opal

Network Card: Integrated Ethernet

Hard Drive: Not available

Keyboard: Not available

Pointing Device: Not available

Why they are selling a low end PC in their workstation lineup is beyond me.

From their marketing:

Quote

Perfect for big jobs like engineering, architecture, science,

Ahmm,it's good just for lite workloads like web browsing and media consumption...

That system is useless for the purposes it's marketed for.

leadeater · July 15, 2020

38 minutes ago, Vishera said:

Why they are selling a low end PC in their workstation lineup is beyond me.

Well low end workstations aren't unique to Lenovo though. Something like that is nice for running multiple monitor hanging displays or where you need a design that has enough power to be useful but needs to be used in an environment that is hazardous so low or no airflow is used for the cooling solution through the chassis.

Like all low spec systems the value is generally bad because there is a minimum cost for the base components, simply paying $50 for a better CPU option significantly increases the value.

Weird bad value options don't actually make more sensible options worse, just like having better more sensible options doesn't make the worse ones better.

Personally I've only used Lenovo servers, mostly IBM though but actually the same thing (long story), and don't have any complaints about those. Similarly there's bad value server options but you don't have to buy them.

SpaceGhostC2C · July 15, 2020

1 hour ago, Vishera said:

ThinkStation P330 Tiny

Well, to put things in perspective:

Spoiler

Usually smaller form factors cost (look at NUC). Still, I find the proce strangely high compared to typical members of the Tiny lineup. My guess is that's the Quadro driving the price.

1 hour ago, Vishera said:

Why they are selling a low end PC in their workstation lineup is beyond me.

The ThinkCentre / Station lineup includes the "thin clients" (I know they aren't technically thin clients; they are thin, though ) as they are part of the business segment.

1 hour ago, Vishera said:

That system is useless for the purposes it's marketed for.

I wouldn't be so sure - I don't see them marketing this as some F@H record-breaker. Their market-speak seems more focused on financial analyst types plotting charts into gazillion screens while pulling the data from a few servers doing the actual hard work....

Sign In

Linus Torvalds: "I Hope AVX512 Dies A Painful Death"

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites