Jump to content

Linus Torvalds: "I Hope AVX512 Dies A Painful Death"

28 minutes ago, SpaceGhostC2C said:

And now I kind of feel we are re-having a conversation we had with you and @leadeater in another thread, don't know which one - I'm becoming a broken record :P 

Linus' comments on AVX-512 were posted in another thread, where I went over some of this already... so yeah... maybe I should have more away from forum time :D 

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, porina said:

As covered before, GPU is not appropriate for all code, even if it appears parallel-able. Data dependencies remain a sticking point, and while GPUs excel in more trivally parallel code, it doesn't work for everything. Something kinda in between a CPU and a GPU might be interesting. I think it was in a different thread, I suggested on possible avenue companies might take in future are simpler CPU cores, but have a lot more of them. Core complexity will still be more CPU-like, but scale more GPU-like.

I never did understand why simpler FPGA PCIe cards haven't caught on. I know they're expensive now, but isn't that due to economies of scale? Seems to me that if you need to run some specialized code but not quite justify an ASIC (because those are really expensive), why not flash an FPGA and dedicate the app to it for that one instruction? Or better yet, have an FPGA core along side all the others in one CPU package.

Link to comment
Share on other sites

Link to post
Share on other sites

38 minutes ago, StDragon said:

I never did understand why simpler FPGA PCIe cards haven't caught on. I know they're expensive now, but isn't that due to economies of scale? Seems to me that if you need to run some specialized code but not quite justify an ASIC (because those are really expensive), why not flash an FPGA and dedicate the app to it for that one instruction? Or better yet, have an FPGA core along side all the others in one CPU package.

Probably cost. FPGAs vary in size. The more serious ones will cost more serious money. If you really need it, you pay for it. Smaller ones may not provide enough return to be worth including. I've not used one in my career in the electronics industry, but programming one probably isn't like what x86 coders are used to. To get it working... may be more pain than it is worth unless you have a big enough task for it.

 

Some here already think AVX-512 is too exotic already. ASICs are way beyond that level.

 

2 minutes ago, valdyrgramr said:

Doesn't this guy just say controversial things to stay relevant?  Not arguing if he's right or wrong, but this guy comes off no different than John McAfee.

I think he's just used to saying his mind. Didn't he go on some training or whatever to make himself a better person to worth with? Wonder what the old him would have said in the same situation? :D 

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, valdyrgramr said:

Well, I mean to be fair it was probably not fair to compare him to McAfee.  McAfee is a crazy ranter who thinks he knows more than he does to boost his own ego. 

McAfee is an anarchist fugitive that's into bath salts. Totally different level compared to Torvalds. It's almost an insult to Torvalds to even be mentioned in the same thread as McAfee. xD

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, StDragon said:

I never did understand why simpler FPGA PCIe cards haven't caught on. I know they're expensive now, but isn't that due to economies of scale? Seems to me that if you need to run some specialized code but not quite justify an ASIC (because those are really expensive), why not flash an FPGA and dedicate the app to it for that one instruction? Or better yet, have an FPGA core along side all the others in one CPU package.

because it wont work well, outside of long continous workloads, fpga would be limited to one or 2 programs, as there is a limited amount of resources on it, everytime you use it you probably have to program it again, so in real use cases there will be a significantly large latency to using it, then you have the same problem gpus have of being away from the cpu with limited local memory, and we are back to needing a much faster lower latency connection, cxl gen-z etc can't come soon enough.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, porina said:

 

 

I think he's just used to saying his mind. Didn't he go on some training or whatever to make himself a better person to worth with? Wonder what the old him would have said in the same situation? :D 

Torvalds used to be very rude and condescending to the people contributing patches to the Linux Kernel, and very public about it. That chases away people from contributing anything.

 

It's a corner stone of polite conversations. Don't throw a stone at someone and not expect to break a window. There's posters on LTT that don't understand this either. We don't ask people to be performatively nice, against the poster's personality, we ask that people don't attack other people for wanting to contribute. 

 

Wikipedia is notoriously awful about this kind of thing, all it takes is one bad interaction and people who were once happy to contribute, would rather go contribute their time to something else. That's not getting upset that ones contributions weren't accepted, but that the interaction was abusive, and why would one ever want to contribute.

 

Anyway, AVX512, probably not the best use of CPU die space, it's probably time to legitimately create a new chip, move some of these edge-case/rarely used instructions into a chiplet design that can be binned off and replaced with additional cores for performance chips.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Kisai said:

Anyway, AVX512, probably not the best use of CPU die space, it's probably time to legitimately create a new chip, move some of these edge-case/rarely used instructions into a chiplet design that can be binned off and replaced with additional cores for performance chips.

AVX512 really does only belong in CPUs like the Cascade Lake-AP product line targeted directly at HPC and computational data analytics, there isn't really a wide use case outside of that and like you say it's possible to improve already used things than something that is not.

 

Most CPUs and systems don't have the supporting cache and memory bandwidth to properly utilize AVX512 and AVX512 so far has not been created or deployed equally.

 

Problem is I don't actually know how much die area AVX512 uses or more correctly how much you gain back by taking it out.

skylake sp mesh core tile zoom with client shown.png

 

As you can see broadly the potential space saving isn't actually that much. But hey a giant AVX512 chiplet(s) using Intel's die stacking would be amazing to see if something like that would work. Second layer could all be AVX512 chiplets with large cache to feed it and I'd think you should have GPU like performance if not much greater, all for the low low price of.... I guess $30k lol.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, leadeater said:

AVX512 really does only belong in CPUs like the Cascade Lake-AP product line targeted directly at HPC and computational data analytics, there isn't really a wide use case outside of that and like you say it's possible to improve already used things than something that is not.

 

Most CPUs and systems don't have the supporting cache and memory bandwidth to properly utilize AVX512 and AVX512 so far has not been created to deployed equally.

 

Problem is I don't actually know how much die area AVX512 uses or more correctly how much you gain back by taking it out.

skylake sp mesh core tile zoom with client shown.png

 

As you can see broadly the potential space saving isn't actually that much. But hey a giant AVX512 chiplet(s) using Intel's die stacking would be amazing to see if something like that would work. Second layer could all be AVX512 chiplets with large cache to feed it and I'd think you should have GPU like performance if not much greater, all for the low low price of.... I guess $30k lol.

Taking AVX-512 out of most consumer chips would probably hurt development somewhat as this would limit the systems you can properly test with before deployment. 

My eyes see the past…

My camera lens sees the present…

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, leadeater said:

AVX512 really does only belong in CPUs like the Cascade Lake-AP product line targeted directly at HPC and computational data analytics, there isn't really a wide use case outside of that and like you say it's possible to improve already used things than something that is not.

 

Most CPUs and systems don't have the supporting cache and memory bandwidth to properly utilize AVX512 and AVX512 so far has not been created or deployed equally.

 

Problem is I don't actually know how much die area AVX512 uses or more correctly how much you gain back by taking it out.

skylake sp mesh core tile zoom with client shown.png

 

As you can see broadly the potential space saving isn't actually that much. But hey a giant AVX512 chiplet(s) using Intel's die stacking would be amazing to see if something like that would work. Second layer could all be AVX512 chiplets with large cache to feed it and I'd think you should have GPU like performance if not much greater, all for the low low price of.... I guess $30k lol.

IMO, and I'm not a chip engineer so don't take this as anything but an educated guess

 

https://www.arm.com/why-arm/technologies/big-little or https://www.arm.com/why-arm/technologies/dynamiq (which is the successor to it)

I'd probably aim for the "BIG.little" type of deal where the BIG cores have all the features, and additional little cores with some of the seldom-used/inefficient power-hungry features removed.

 

So for a many-core server you'd probably have many mostly BIG cores at a fixed TDP, and a few little cores that are used for background processes that the heavy load probably doesn't use.

 

For a desktop or laptop H part you'd probably have several higher speed BIG cores and a few little cores that all maxed out use the maximum TDP. 

 

Then for a U/Y laptop or tablet device you'd have mostly LITTLE cores and only a few of the BIG cores.

 

IN a stacked/chiplet design you'd then have the option to re-arrange these to either be BIG.BIG, BIG.LITTLE, or LITTLE. So an i9 might be all BIG cores and only two little cores (eg 10+2,)  and a i3 might only have two BIG cores and all LITTLE cores (2+4) and the i5 would be (4+8), i7 would be (8+4)

 

Anyway that is just my thought on it. Maybe Intel can't do this with the Core architecture.

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, leadeater said:

Problem is I don't actually know how much die area AVX512 uses or more correctly how much you gain back by taking it out.

skylake sp mesh core tile zoom with client shown.png

 

Here is now much AVX-512 takes in direct area cost:

 

skx.thumb.jpg.e9f48833826e2d4c5d656463afe043e9.jpg

 

There is additional hidden cost in extended data-paths overhead of the L1 data cache, to keep up with feeding the dual 512-bit pipelines.

Link to comment
Share on other sites

Link to post
Share on other sites

21 minutes ago, DuckDodgers said:

There is additional hidden cost in extended data-paths overhead of the L1 data cache, to keep up with feeding the dual 512-bit pipelines

How much is a bit more than that though since it's not just taking them out, doing so means an entire re-layout of the die so you could gain more or less than expected because of design rules. Client arch only has single AVX512 unit (unless that's changed) and HEDT/Server has two (with one disabled in lower SKU Xeon) which is that FMA sitting in the LLC, in fact those entire areas around the the main Core are only on HEDT/Server and not client, client arch uses Ring Bus and the L3 Cache is in that area (another Intel problem right now, Ring vs Mesh rather than just one).

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, DuckDodgers said:

Quoth the image

How do you even know whats what in that photo

✨FNIGE✨

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Kisai said:

Torvalds used to be very rude and condescending to the people contributing patches to the Linux Kernel, and very public about it. That chases away people from contributing anything.

 

It's a corner stone of polite conversations. Don't throw a stone at someone and not expect to break a window. There's posters on LTT that don't understand this either. We don't ask people to be performatively nice, against the poster's personality, we ask that people don't attack other people for wanting to contribute. 

 

Wikipedia is notoriously awful about this kind of thing, all it takes is one bad interaction and people who were once happy to contribute, would rather go contribute their time to something else. That's not getting upset that ones contributions weren't accepted, but that the interaction was abusive, and why would one ever want to contribute.

 

Anyway, AVX512, probably not the best use of CPU die space, it's probably time to legitimately create a new chip, move some of these edge-case/rarely used instructions into a chiplet design that can be binned off and replaced with additional cores for performance chips.

linus is a big two edged sword, he is great as he has the balls to call out big companies when its appropriate (or not), but he is a huge dick, linux would be better of if they had a linus translator for the communications between linus and contributors.

like: you suck balls=>you could be better

 

3 hours ago, leadeater said:

How much is a bit more than that though since it's not just taking them out, doing so means an entire re-layout of the die so you could gain more or less than expected because of design rules. Client arch only has single AVX512 unit (unless that's changed) and HEDT/Server has two (with one disabled in lower SKU Xeon) which is that FMA sitting in the LLC, in fact those entire areas around the the main Core are only on HEDT/Server and not client, client arch uses Ring Bus and the L3 Cache is in that area (another Intel problem right now, Ring vs Mesh rather than just one).

ring bus should be on its way out anyways, even the consumer chips are already at 10 cores which really is where ring bus completely stops, the big little rumors point to 2 ring buses, but that wont help too much for too long.

btw do you think it will be successful, the big little approach i mean, seems like it wont to me, specially with amd pumping 12+ core chips 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, cj09beira said:

btw do you think it will be successful, the big little approach i mean, seems like it wont to me, specially with amd pumping 12+ core chips 

No idea, it'll live or die on the software support side. I think Intel are targeting it at the lower market segments and low power so I don't think higher end Ryzen will have much impact at all. I do think both the Ryzen 4000 H & U perform significantly better than Intel was planning on though. If AMD can take the laptop market that alone will have the most impact on both companies, it's by far the largest volume PC market.

 

Next 12 months is likely going to shape the next decade, fun times, fun times.

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, leadeater said:

No idea, it'll live or die on the software support side. I think Intel are targeting it at the lower market segments and low power so I don't think higher end Ryzen will have much impact at all. I do think both the Ryzen 4000 H & U perform significantly better than Intel was planning on though. If AMD can take the laptop market that alone will have the most impact on both companies, it's by far the largest volume PC market.

 

Next 12 months is likely going to shape the next decade, fun times, fun times.

The laptop space is finally moving again. I love that I can now get a 6c 16GB of ram ultra book for $600

if you want to annoy me, then join my teamspeak server ts.benja.cc

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, cj09beira said:

https://vsstaticssl.lvl3.on24.com/event/25/07/35/5/rt/1/resources/ThinkStation%2BP620%2BDatasheet%2B-%2BFinal-AD6B.pdf

 

threadripper pro is here:

128 pcie lanes true

64 cores true

1TB of memory capacity with 128GB dimms true

 

and lenovo is launching a workstation with it.

 

Be right back, I have a bank to rob...

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, cj09beira said:

https://vsstaticssl.lvl3.on24.com/event/25/07/35/5/rt/1/resources/ThinkStation%2BP620%2BDatasheet%2B-%2BFinal-AD6B.pdf

 

threadripper pro is here:

128 pcie lanes true

64 cores true

1TB of memory capacity with 128GB dimms true

 

and lenovo is launching a workstation with it.

@leadeater

 

Quote

AMD Announces Ryzen Threadripper Pro: Workstation Parts for OEMs Only

Lammmeeeeeeeee

Link to comment
Share on other sites

Link to post
Share on other sites

On 7/14/2020 at 8:38 AM, valdyrgramr said:

Well, I mean to be fair it was probably not fair to compare him to McAfee.

I think it's fair.  To me he is a cross between steve jobs  and Mcafee.  He is good at doing that one thing he is good at, but the rest of the time he is just angry, treating workers like shit and saying stuff not just out of context but outside of his relevance.

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, cj09beira said:

and lenovo is launching a workstation with it.

Their workstation lineup is even worse than their gaming lineup,

for $1,309.00 you are getting:

Quote
ThinkStation P330 Tiny
 
Processor: 8th Gen Intel® Core™ i3-8100T (3.10GHz, 4 cores, 6MB Cache)
Operating System: Windows 10 Pro 64
Memory: 8GB DDR4 2666MHz
Graphics: NVIDIA Quadro P620 2GB 4 x Mini DP
Optical Drive: None
Warranty: 3 Year On-site
Form Factor: Tiny Q370
M.2 Storage Card: 256GB Solid State Drive, PCIe-NVME, M.2, Opal
Network Card: Integrated Ethernet
Hard Drive: Not available
Keyboard: Not available
Pointing Device: Not available

Why they are selling a low end PC in their workstation lineup is beyond me.

 

From their marketing:

Quote

Perfect for big jobs like engineering, architecture, science,

Ahmm,it's good just for lite workloads like web browsing and media consumption...

That system is useless for the purposes it's marketed for.

A PC Enthusiast since 2011
AMD Ryzen 7 5700X@4.65GHz | GIGABYTE GTX 1660 GAMING OC @ Core 2085MHz Memory 5000MHz
Cinebench R23: 15669cb | Unigine Superposition 1080p Extreme: 3566
Link to comment
Share on other sites

Link to post
Share on other sites

38 minutes ago, Vishera said:

Why they are selling a low end PC in their workstation lineup is beyond me.

Well low end workstations aren't unique to Lenovo though. Something like that is nice for running multiple monitor hanging displays or where you need a design that has enough power to be useful but needs to be used in an environment that is hazardous so low or no airflow is used for the cooling solution through the chassis.

 

Like all low spec systems the value is generally bad because there is a minimum cost for the base components, simply paying $50 for a better CPU option significantly increases the value.

 

Weird bad value options don't actually make more sensible options worse, just like having better more sensible options doesn't make the worse ones better.

 

Personally I've only used Lenovo servers, mostly IBM though but actually the same thing (long story), and don't have any complaints about those. Similarly there's bad value server options but you don't have to buy them.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Vishera said:

ThinkStation P330 Tiny

 

Well, to put things in perspective:

Spoiler

lenovo_30cf000kus_p330_tiny_i7_8700t_2_4

Usually smaller form factors cost (look at NUC). Still, I find the proce strangely high compared to typical members of the Tiny lineup. My guess is that's the Quadro driving the price.

 

1 hour ago, Vishera said:

 

Why they are selling a low end PC in their workstation lineup is beyond me.

The ThinkCentre / Station lineup includes the "thin clients" (I know they aren't technically thin clients; they are thin, though :P) as they are part of the business segment.

 

1 hour ago, Vishera said:

 

That system is useless for the purposes it's marketed for.

I wouldn't be so sure - I don't see them marketing this as some F@H record-breaker. Their market-speak seems more focused on financial analyst types plotting charts into gazillion screens while pulling the data from a few servers doing the actual hard work....

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×