Linus Torvalds: "I Hope AVX512 Dies A Painful Death"

DuckDodgers · July 13, 2020

More rage bursts from Linus Torvalds, this time targeted at Intel.

Quotes

Quote

In a mailing list discussion stemming from the Phoronix article this week on the compiler instructions Intel is enabling for Alder Lake (and Sapphire Rapids), Linus Torvalds chimed in. The Alder Lake instructions being flipped on in GCC right now make no mention of AVX-512 but only AVX2 and others, likely due to Intel pursuing the subset supported by both the small and large cores in this new hybrid design being pursued.

The lack of seeing AVX512 for Alder Lake led Torvalds to comment:

I hope AVX512 dies a painful death, and that Intel starts fixing real problems instead of trying to create magic instructions to then create benchmarks that they can look good on.
I hope Intel gets back to basics: gets their process working again, and concentrate more on regular code that isn't HPC or some other pointless special case.

My thoughts

Old Linus yells at yet another x86 ISA extension.

Sources

https://www.phoronix.com/scan.php?page=news_item&px=Linus-Torvalds-On-AVX-512

paddy-stone · July 13, 2020

Don't know that I would consider this NEWS as such.

Sauron · July 13, 2020

True. Also maybe if they get rid of it software will finally be properly optimized for the universally available AVX2.

Arika · July 13, 2020

Kilrah · July 13, 2020

18 minutes ago, DuckDodgers said:

and concentrate more on regular code that isn't HPC or some other pointless special case.

Well good luck with that, the "pointless special cases" is most likely what brings the monies in from big corp.

Drama Lama · July 13, 2020

x86 is getting overloaded

Edit : realised that that is not the point Linus Torvalds wanted to say

Edited July 14, 2020 by Drama Lama

jagdtigger · July 13, 2020

20 minutes ago, Drama Lama said:

x86 is getting overloaded

More like over-bloated.....

descendency · July 13, 2020

His point is about Intel creating an instruction, getting some benchmark company to create a tool to test it, and then running it against Ryzen and Intel chips to show that Intel is still relevant. But no one is using AVX512 (yet). "look at how bad Ryzen chips are at running this one off test for an instruction that we just invented! Buy our stuff plx."

It has nothing to do with the size of the ISA.

CTR640 · July 13, 2020

This guy is awesome lol. He has the balls to tell nVidia to fuck off and now Intel. Linus Torvalds is right tho. Intel is tryna make dem cpu's look great while AMD is doing better and kicks Intel in the ass.

WereCatf · July 13, 2020

1 hour ago, descendency said:

But no one is using AVX512 (yet)

Quite a lot of people actually seem to be using it. Heck, some of the more common software that do have AVX512-support are...x265 and x264.

cj09beira · July 13, 2020

i agree that many of the the work loads avx 512 excels at are probably better suited to gpus, though at the same time moving the data to gpu has costs and they don't always fit, which is probably a good place to improve things, can we please get gen-z or cxl?

RejZoR · July 13, 2020

There's nothing wrong with instructions if they can speed things up. MMX, SSE and 3DNow! made huge difference for games to a point trying to run them without these instructions meant they ran significantly slower.

Btw, Cinebench R20 uses AVX2 and AVX512 and AMD still just totally annihilates Intel. So, just instructions alone don't really shift things in any direction if competing product is just better.

Btw, how is with FMA instructions? Are they used anywhere practical or how is with that?

cj09beira · July 13, 2020

29 minutes ago, RejZoR said:

There's nothing wrong with instructions if they can speed things up. MMX, SSE and 3DNow! made huge difference for games to a point trying to run them without these instructions meant they ran significantly slower.

Btw, Cinebench R20 uses AVX2 and AVX512 and AMD still just totally annihilates Intel. So, just instructions alone don't really shift things in any direction if competing product is just better.

Btw, how is with FMA instructions? Are they used anywhere practical or how is with that?

it depends, there are at least 2 FMA instruction sets, FMA4 and FMA3, FMA4 is present on zen chips but not advertised, not sure if zen 2 has it, early tests showed better performance than AVX2 but if was buggy and gave wrong results

DuckDodgers · July 13, 2020

49 minutes ago, RejZoR said:

Btw, Cinebench R20 uses AVX2 and AVX512 and AMD still just totally annihilates Intel. So, just instructions alone don't really shift things in any direction if competing product is just better.

Run-time analysis of the benchmark indicates it doesn't touch AVX-512 at all and it uses a mix of SSE(1/2), AVX(1/2) and FMA instructions in various proportions.

Looks like Maxon implemented only a small part of Intel's Embree library.

17 minutes ago, cj09beira said:

it depends, there are at least 2 FMA instruction sets, FMA4 and FMA3, FMA4 is present on zen chips but not advertised, not sure if zen 2 has it, early tests showed better performance than AVX2 but if was buggy and gave wrong results

FMA4 is removed from Zen 2.

porina · July 13, 2020

5 hours ago, Sauron said:

True. Also maybe if they get rid of it software will finally be properly optimized for the universally available AVX2.

AVX-512 isn't just AVX2 but bigger, but is kinda a whole group of instruction sets. At a basic level, that I think is standard in AVX-512, it does AVX2 but up to twice as much. It depends on CPU implementation, and requires "2 unit" AVX-512 to get the doubling. Otherwise, it may be no better than AVX2. If it can already scale to AVX2, it shouldn't be that hard to implement AVX-512, although the increased processing rate can mean you hit other limits in the architecture faster.

AVX-512 also has optional additional feature sets, like the latest one targeted at machine learning applications. Maybe we can have a CPU equivalent to nvidia doing DLSS in GPU in the future for example. Who knows where software could take us once there is sufficient support.

1 hour ago, RejZoR said:

Btw, Cinebench R20 uses AVX2 and AVX512 and AMD still just totally annihilates Intel. So, just instructions alone don't really shift things in any direction if competing product is just better.

Does CB R20 use AVX-512? I'm not sure about that. Guess I could walk approx 3m to my left and fire up my 7920X and try it...

Anyway, if R20 "only" uses AVX2, AMD Zen 2 does have an on average better implementation of it than Intel desktop consumer CPUs.

1 hour ago, RejZoR said:

Btw, how is with FMA instructions? Are they used anywhere practical or how is with that?

DSP type processing. The way filter coefficients and data are used, you often have to do a multiply and add, so why not do them with a single instruction.

35 minutes ago, cj09beira said:

it depends, there are at least 2 FMA instruction sets, FMA4 and FMA3, FMA4 is present on zen chips but not advertised, not sure if zen 2 has it, early tests showed better performance than AVX2 but if was buggy and gave wrong results

FMA3 and FMA4 differ in the number of operands but otherwise do the same thing. FMA4 was introduced by AMD during the was it bulldozer era? The thing with the modules. The problem was that instruction support is worthless without the hardware to back it up. Those CPUs just didn't have the compute capability, so FMA4 ran no faster than other instruction alternatives. On the other hand, Intel did have FMA hardware from Haswell onwards, and that lead to a big jump in performance from Sandy/Ivy Bridge.

trag1c · July 13, 2020

5 minutes ago, DuckDodgers said:

Run-time analysis of the benchmark indicates it doesn't touch AVX-512 at all and it uses a mix of SSE(1/2), AVX(1/2) and FMA instructions in various proportions.

Looks like Maxon implemented only a small part of Intel's Embree library.

FMA4 is removed from Zen 2.

I think most of the math libraries out there for 3D applications haven't touched AVX yet. They're still running SSE2 because it's pretty much ubiquitous with any processors being used. Still a pain in the ass to use for any horizontal math. Though that changed with SSE3 causing some pretty big performance bumps because you don't have to get creative with it and even better in SSE4.1 because they added a single instruction dot product to it. I personally don't see a point to moving to AVX because the workload is perfectly suited to SSE you don't really need much more. At least for games.

Sauron · July 13, 2020

31 minutes ago, porina said:

AVX-512 isn't just AVX2 but bigger, but is kinda a whole group of instruction sets. At a basic level, that I think is standard in AVX-512, it does AVX2 but up to twice as much. It depends on CPU implementation, and requires "2 unit" AVX-512 to get the doubling. Otherwise, it may be no better than AVX2. If it can already scale to AVX2, it shouldn't be that hard to implement AVX-512, although the increased processing rate can mean you hit other limits in the architecture faster.

Yeah I know, but afaik because it is a different set of instructions from AVX2 optimization is typically done with only one of the two in mind even if the developers don't strictly need the extra features. Plus I'm not certain that workloads always scale up - a lot of things barely scale up to 64bit words, let alone 256 or 512 - and it completely obliterates your memory because you need longer words even for small data, sometimes for no real reason.

38 minutes ago, porina said:

AVX-512 also has optional additional feature sets, like the latest one targeted at machine learning applications. Maybe we can have a CPU equivalent to nvidia doing DLSS in GPU in the future for example. Who knows where software could take us once there is sufficient support.

....yeahhh but honestly I don't see it, gpus and CUDA in particular is too deeply dug in by now.

39 minutes ago, trag1c said:

I think most of the math libraries out there for 3D applications haven't touched AVX yet.

Eigen has afaik

trag1c · July 13, 2020

29 minutes ago, Sauron said:

Eigen has afaik

Didn't know that... not that I really ever look at patch notes for any of those libraries lol. I wonder when that came in.

DuckDodgers · July 13, 2020

1 hour ago, porina said:

AVX-512 also has optional additional feature sets, like the latest one targeted at machine learning applications. Maybe we can have a CPU equivalent to nvidia doing DLSS in GPU in the future for example. Who knows where software could take us once there is sufficient support.

Since no one is going back to software graphics engines for games, the prospect of more SIMD extensions with wider vectors has become a trend with diminishing returns for the mass market. Even workstation loads, like off-line rendering and photo/video editing are relying more and more on the GPU for parallel processing, besides the traditional graphics acceleration -- it's faster and more energy efficient, and the API overhead is being reduced faster than any new CPU ISA addition could gain traction.

StDragon · July 13, 2020

Some have stated that switching between AVX-512 and other instruction sets incurs a major latency performance penalty. So, it seem to me that if you're going to use AVX-512, your applications are better off all-in then attempting to nip at it casually. But then again, to use something like that so heavily, why not just GPGPU the app to begin with?

porina · July 13, 2020

1 hour ago, Sauron said:

Plus I'm not certain that workloads always scale up - a lot of things barely scale up to 64bit words, let alone 256 or 512 - and it completely obliterates your memory because you need longer words even for small data, sometimes for no real reason.

If makes sense to use SIMD if you do have that multiple data. It is not meant to be a universal solution. Use it if available and appropriate, which for sure wont be always.

37 minutes ago, DuckDodgers said:

Since no one is going back to software graphics engines for games, the prospect of more SIMD extensions with wider vectors has become a trend with diminishing returns for the mass market. Even workstation loads, like off-line rendering and photo/video editing are relying more and more on the GPU for parallel processing, besides the traditional graphics acceleration -- it's faster and more energy efficient, and the API overhead is being reduced faster than any new CPU ISA addition could gain traction.

I'd argue the use of the term "software" here. GPU code is still software. The software runs on hardware, regardless if it is CPU or GPU. If you mean "software" in the sense of a generic implementation as opposed to making use of specific hardware support, I could see that. But we do have that hardware support. A bit speculative perhaps, but what if AMD do not follow nvidia with tensor cores. That could leave a gap open on the CPU side to fill.

GPUs have come a long way for general compute since the first implementations, but they still have a long way to go to substantially replace the flexibility of a CPU. As such they still remain complimentary where it is appropriate.

17 minutes ago, StDragon said:

Some have stated that switching between AVX-512 and other instruction sets incurs a major latency performance penalty. So, it seem to me that if you're going to use AVX-512, your applications are better off all-in then attempting to nip at it casually. But then again, to use something like that so heavily, why not just GPGPU the app to begin with?

As covered before, GPU is not appropriate for all code, even if it appears parallel-able. Data dependencies remain a sticking point, and while GPUs excel in more trivally parallel code, it doesn't work for everything. Something kinda in between a CPU and a GPU might be interesting. I think it was in a different thread, I suggested on possible avenue companies might take in future are simpler CPU cores, but have a lot more of them. Core complexity will still be more CPU-like, but scale more GPU-like.

Arguments seem to be going around in circles a bit here. It makes sense to use certain options when it makes sense, but it doesn't mean it'll be useful for everything.

Sauron · July 13, 2020

50 minutes ago, trag1c said:

Didn't know that... not that I really ever look at patch notes for any of those libraries lol. I wonder when that came in.

At least 3-4 years ago, not sure exactly when though.

SpaceGhostC2C · July 13, 2020

2 hours ago, trag1c said:

At least for games.

Games isn't why these instruction sets came to be, though.

15 minutes ago, porina said:

I think it was in a different thread, I suggested on possible avenue companies might take in future are simpler CPU cores, but have a lot more of them. Core complexity will still be more CPU-like, but scale more GPU-like.

That's Xeon Phi

porina · July 13, 2020

8 minutes ago, SpaceGhostC2C said:

That's Xeon Phi

Kinda. If I were to design a CPU for my application, I would have come up with something like that. However I think in today's balance, the non-FPU parts will have to be better than was included in Phi, plus it was still cache/bandwidth limited. Need much more.

SpaceGhostC2C · July 13, 2020

18 minutes ago, porina said:

Kinda. If I were to design a CPU for my application, I would have come up with something like that. However I think in today's balance, the non-FPU parts will have to be better than was included in Phi, plus it was still cache/bandwidth limited. Need much more.

I think Intel's more recent approach to that, and probably related to the expansion it its GPU division, is to have the fusion happen in software rather than hardware:

https://software.intel.com/content/www/us/en/develop/tools/oneapi.html

A bit like AMD's HSA initiative which didn't translate into much at the time.

And now I kind of feel we are re-having a conversation we had with you and @leadeater in another thread, don't know which one - I'm becoming a broken record

Sign In

Linus Torvalds: "I Hope AVX512 Dies A Painful Death"

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites